diff --git a/API_REFERENCE.md b/API_REFERENCE.md
deleted file mode 100644
index 44ffd6d..0000000
--- a/API_REFERENCE.md
+++ /dev/null
@@ -1,460 +0,0 @@
-# Miku Discord Bot API Reference
-
-The Miku bot exposes a FastAPI REST API on port 3939 for controlling and monitoring the bot.
-
-## Base URL
-```
-http://localhost:3939
-```
-
-## API Endpoints
-
-### 📊 Status & Information
-
-#### `GET /status`
-Get current bot status and overview.
-
-**Response:**
-```json
-{
-  "status": "online",
-  "mood": "neutral",
-  "servers": 2,
-  "active_schedulers": 2,
-  "server_moods": {
-    "123456789": "bubbly",
-    "987654321": "excited"
-  }
-}
-```
-
-#### `GET /logs`
-Get the last 100 lines of bot logs.
-
-**Response:** Plain text log output
-
-#### `GET /prompt`
-Get the last full prompt sent to the LLM.
-
-**Response:**
-```json
-{
-  "prompt": "Last prompt text..."
-}
-```
-
----
-
-### 😊 Mood Management
-
-#### `GET /mood`
-Get current DM mood.
-
-**Response:**
-```json
-{
-  "mood": "neutral",
-  "description": "Mood description text..."
-}
-```
-
-#### `POST /mood`
-Set DM mood.
-
-**Request Body:**
-```json
-{
-  "mood": "bubbly"
-}
-```
-
-**Response:**
-```json
-{
-  "status": "ok",
-  "new_mood": "bubbly"
-}
-```
-
-#### `POST /mood/reset`
-Reset DM mood to neutral.
-
-#### `POST /mood/calm`
-Calm Miku down (set to neutral).
-
-#### `GET /servers/{guild_id}/mood`
-Get mood for specific server.
-
-#### `POST /servers/{guild_id}/mood`
-Set mood for specific server.
-
-**Request Body:**
-```json
-{
-  "mood": "excited"
-}
-```
-
-#### `POST /servers/{guild_id}/mood/reset`
-Reset server mood to neutral.
-
-#### `GET /servers/{guild_id}/mood/state`
-Get complete mood state for server.
-
-#### `GET /moods/available`
-List all available moods.
-
-**Response:**
-```json
-{
-  "moods": {
-    "neutral": "😊",
-    "bubbly": "🥰",
-    "excited": "🤩",
-    "sleepy": "😴",
-    ...
-  }
-}
-```
-
----
-
-### 😴 Sleep Management
-
-#### `POST /sleep`
-Force Miku to sleep.
-
-#### `POST /wake`
-Wake Miku up.
-
-#### `POST /bedtime?guild_id={guild_id}`
-Send bedtime reminder. If `guild_id` is provided, sends only to that server.
-
----
-
-### 🤖 Autonomous Actions
-
-#### `POST /autonomous/general?guild_id={guild_id}`
-Trigger autonomous general message.
-
-#### `POST /autonomous/engage?guild_id={guild_id}`
-Trigger autonomous user engagement.
-
-#### `POST /autonomous/tweet?guild_id={guild_id}`
-Trigger autonomous tweet sharing.
-
-#### `POST /autonomous/reaction?guild_id={guild_id}`
-Trigger autonomous reaction to a message.
-
-#### `POST /autonomous/custom?guild_id={guild_id}`
-Send custom autonomous message.
-
-**Request Body:**
-```json
-{
-  "prompt": "Say something funny about cats"
-}
-```
-
-#### `GET /autonomous/stats`
-Get autonomous engine statistics for all servers.
-
-**Response:** Detailed stats including message counts, activity, mood profiles, etc.
-
-#### `GET /autonomous/v2/stats/{guild_id}`
-Get autonomous V2 stats for specific server.
-
-#### `GET /autonomous/v2/check/{guild_id}`
-Check if autonomous action should happen for server.
-
-#### `GET /autonomous/v2/status`
-Get autonomous V2 status across all servers.
-
----
-
-### 🌐 Server Management
-
-#### `GET /servers`
-List all configured servers.
-
-**Response:**
-```json
-{
-  "servers": [
-    {
-      "guild_id": 123456789,
-      "guild_name": "My Server",
-      "autonomous_channel_id": 987654321,
-      "autonomous_channel_name": "general",
-      "bedtime_channel_ids": [111111111],
-      "enabled_features": ["autonomous", "bedtime"]
-    }
-  ]
-}
-```
-
-#### `POST /servers`
-Add a new server configuration.
-
-**Request Body:**
-```json
-{
-  "guild_id": 123456789,
-  "guild_name": "My Server",
-  "autonomous_channel_id": 987654321,
-  "autonomous_channel_name": "general",
-  "bedtime_channel_ids": [111111111],
-  "enabled_features": ["autonomous", "bedtime"]
-}
-```
-
-#### `DELETE /servers/{guild_id}`
-Remove server configuration.
-
-#### `PUT /servers/{guild_id}`
-Update server configuration.
-
-#### `POST /servers/{guild_id}/bedtime-range`
-Set bedtime range for server.
-
-#### `POST /servers/{guild_id}/memory`
-Update server memory/context.
-
-#### `GET /servers/{guild_id}/memory`
-Get server memory/context.
-
-#### `POST /servers/repair`
-Repair server configurations.
-
----
-
-### 💬 DM Management
-
-#### `GET /dms/users`
-List all users with DM history.
-
-**Response:**
-```json
-{
-  "users": [
-    {
-      "user_id": "123456789",
-      "username": "User#1234",
-      "total_messages": 42,
-      "last_message_date": "2025-12-10T12:34:56",
-      "is_blocked": false
-    }
-  ]
-}
-```
-
-#### `GET /dms/users/{user_id}`
-Get details for specific user.
-
-#### `GET /dms/users/{user_id}/conversations`
-Get conversation history for user.
-
-#### `GET /dms/users/{user_id}/search?query={query}`
-Search user's DM history.
-
-#### `GET /dms/users/{user_id}/export`
-Export user's DM history.
-
-#### `DELETE /dms/users/{user_id}`
-Delete user's DM data.
-
-#### `POST /dm/{user_id}/custom`
-Send custom DM (LLM-generated).
-
-**Request Body:**
-```json
-{
-  "prompt": "Ask about their day"
-}
-```
-
-#### `POST /dm/{user_id}/manual`
-Send manual DM (direct message).
-
-**Form Data:**
-- `message`: Message text
-
-#### `GET /dms/blocked-users`
-List blocked users.
-
-#### `POST /dms/users/{user_id}/block`
-Block a user.
-
-#### `POST /dms/users/{user_id}/unblock`
-Unblock a user.
-
-#### `POST /dms/users/{user_id}/conversations/{conversation_id}/delete`
-Delete specific conversation.
-
-#### `POST /dms/users/{user_id}/conversations/delete-all`
-Delete all conversations for user.
-
-#### `POST /dms/users/{user_id}/delete-completely`
-Completely delete user data.
-
----
-
-### 📊 DM Analysis
-
-#### `POST /dms/analysis/run`
-Run analysis on all DM conversations.
-
-#### `POST /dms/users/{user_id}/analyze`
-Analyze specific user's DMs.
-
-#### `GET /dms/analysis/reports`
-Get all analysis reports.
-
-#### `GET /dms/analysis/reports/{user_id}`
-Get analysis report for specific user.
-
----
-
-### 🖼️ Profile Picture Management
-
-#### `POST /profile-picture/change?guild_id={guild_id}`
-Change profile picture. Optionally upload custom image.
-
-**Form Data:**
-- `file`: Image file (optional)
-
-**Response:**
-```json
-{
-  "status": "ok",
-  "message": "Profile picture changed successfully",
-  "source": "danbooru",
-  "metadata": {
-    "url": "https://...",
-    "tags": ["hatsune_miku", "...]
-  }
-}
-```
-
-#### `GET /profile-picture/metadata`
-Get current profile picture metadata.
-
-#### `POST /profile-picture/restore-fallback`
-Restore original fallback profile picture.
-
----
-
-### 🎨 Role Color Management
-
-#### `POST /role-color/custom`
-Set custom role color.
-
-**Form Data:**
-- `hex_color`: Hex color code (e.g., "#FF0000")
-
-#### `POST /role-color/reset-fallback`
-Reset role color to fallback (#86cecb).
-
----
-
-### 💬 Conversation Management
-
-#### `GET /conversation/{user_id}`
-Get conversation history for user.
-
-#### `POST /conversation/reset`
-Reset conversation history.
-
-**Request Body:**
-```json
-{
-  "user_id": "123456789"
-}
-```
-
----
-
-### 📨 Manual Messaging
-
-#### `POST /manual/send`
-Send manual message to channel.
-
-**Form Data:**
-- `message`: Message text
-- `channel_id`: Channel ID
-- `files`: Files to attach (optional, multiple)
-
----
-
-### 🎁 Figurine Notifications
-
-#### `GET /figurines/subscribers`
-List figurine subscribers.
-
-#### `POST /figurines/subscribers`
-Add figurine subscriber.
-
-#### `DELETE /figurines/subscribers/{user_id}`
-Remove figurine subscriber.
-
-#### `POST /figurines/send_now`
-Send figurine notification to all subscribers.
-
-#### `POST /figurines/send_to_user`
-Send figurine notification to specific user.
-
----
-
-### 🖼️ Image Generation
-
-#### `POST /image/generate`
-Generate image using image generation service.
-
-#### `GET /image/status`
-Get image generation service status.
-
-#### `POST /image/test-detection`
-Test face detection on uploaded image.
-
----
-
-### 😀 Message Reactions
-
-#### `POST /messages/react`
-Add reaction to a message.
-
-**Request Body:**
-```json
-{
-  "channel_id": "123456789",
-  "message_id": "987654321",
-  "emoji": "😊"
-}
-```
-
----
-
-## Error Responses
-
-All endpoints return errors in the following format:
-
-```json
-{
-  "status": "error",
-  "message": "Error description"
-}
-```
-
-HTTP status codes:
-- `200` - Success
-- `400` - Bad request
-- `404` - Not found
-- `500` - Internal server error
-
-## Authentication
-
-Currently, the API does not require authentication. It's designed to run on localhost within a Docker network.
-
-## Rate Limiting
-
-No rate limiting is currently implemented.
diff --git a/CHAT_INTERFACE_FEATURE.md b/CHAT_INTERFACE_FEATURE.md
deleted file mode 100644
index 86bf0a5..0000000
--- a/CHAT_INTERFACE_FEATURE.md
+++ /dev/null
@@ -1,296 +0,0 @@
-# Chat Interface Feature Documentation
-
-## Overview
-A new **"Chat with LLM"** tab has been added to the Miku bot Web UI, allowing you to chat directly with the language models with full streaming support (similar to ChatGPT).
-
-## Features
-
-### 1. Model Selection
-- **💬 Text Model (Fast)**: Chat with the text-based LLM for quick conversations
-- **👁️ Vision Model (Images)**: Use the vision model to analyze and discuss images
-
-### 2. System Prompt Options
-- **✅ Use Miku Personality**: Attach the standard Miku personality system prompt
-  - Text model: Gets the full Miku character prompt (same as `query_llama`)
-  - Vision model: Gets a simplified Miku-themed image analysis prompt
-- **❌ Raw LLM (No Prompt)**: Chat directly with the base LLM without any personality
-  - Great for testing raw model responses
-  - No character constraints
-
-### 3. Real-time Streaming
-- Messages stream in character-by-character like ChatGPT
-- Shows typing indicator while waiting for response
-- Smooth, responsive interface
-
-### 4. Vision Model Support
-- Upload images when using the vision model
-- Image preview before sending
-- Analyze images with Miku's personality or raw vision capabilities
-
-### 5. Chat Management
-- Clear chat history button
-- Timestamps on all messages
-- Color-coded messages (user vs assistant)
-- Auto-scroll to latest message
-- Keyboard shortcut: **Ctrl+Enter** to send messages
-
-## Technical Implementation
-
-### Backend (api.py)
-
-#### New Endpoint: `POST /chat/stream`
-```python
-# Accepts:
-{
-  "message": "Your chat message",
-  "model_type": "text" | "vision",
-  "use_system_prompt": true | false,
-  "image_data": "base64_encoded_image" (optional, for vision model)
-}
-
-# Returns: Server-Sent Events (SSE) stream
-data: {"content": "streamed text chunk"}
-data: {"done": true}
-data: {"error": "error message"}
-```
-
-**Key Features:**
-- Uses Server-Sent Events (SSE) for streaming
-- Supports both `TEXT_MODEL` and `VISION_MODEL` from globals
-- Dynamically switches system prompts based on configuration
-- Integrates with llama.cpp's streaming API
-
-### Frontend (index.html)
-
-#### New Tab: "💬 Chat with LLM"
-Located in the main navigation tabs (tab6)
-
-**Components:**
-1. **Configuration Panel**
-   - Radio buttons for model selection
-   - Radio buttons for system prompt toggle
-   - Image upload section (shows/hides based on model)
-   - Clear chat history button
-
-2. **Chat Messages Container**
-   - Scrollable message history
-   - Animated message appearance
-   - Typing indicator during streaming
-   - Color-coded messages with timestamps
-
-3. **Input Area**
-   - Multi-line text input
-   - Send button with loading state
-   - Keyboard shortcuts
-
-**JavaScript Functions:**
-- `sendChatMessage()`: Handles message sending and streaming reception
-- `toggleChatImageUpload()`: Shows/hides image upload for vision model
-- `addChatMessage()`: Adds messages to chat display
-- `showTypingIndicator()` / `hideTypingIndicator()`: Typing animation
-- `clearChatHistory()`: Clears all messages
-- `handleChatKeyPress()`: Keyboard shortcuts
-
-## Usage Guide
-
-### Basic Text Chat with Miku
-1. Go to "💬 Chat with LLM" tab
-2. Ensure "💬 Text Model" is selected
-3. Ensure "✅ Use Miku Personality" is selected
-4. Type your message and click "📤 Send" (or press Ctrl+Enter)
-5. Watch as Miku's response streams in real-time!
-
-### Raw LLM Testing
-1. Select "💬 Text Model"
-2. Select "❌ Raw LLM (No Prompt)"
-3. Chat directly with the base language model without personality constraints
-
-### Vision Model Chat
-1. Select "👁️ Vision Model"
-2. Click "Upload Image" and select an image
-3. Type a message about the image (e.g., "What do you see in this image?")
-4. Click "📤 Send"
-5. The vision model will analyze the image and respond
-
-### Vision Model with Miku Personality
-1. Select "👁️ Vision Model"
-2. Keep "✅ Use Miku Personality" selected
-3. Upload an image
-4. Miku will analyze and comment on the image with her cheerful personality!
-
-## System Prompts
-
-### Text Model (with Miku personality)
-Uses the same comprehensive system prompt as `query_llama()`:
-- Full Miku character context
-- Current mood integration
-- Character consistency rules
-- Natural conversation guidelines
-
-### Vision Model (with Miku personality)
-Simplified prompt optimized for image analysis:
-```
-You are Hatsune Miku analyzing an image. Describe what you see naturally 
-and enthusiastically as Miku would. Be detailed but conversational. 
-React to what you see with Miku's cheerful, playful personality.
-```
-
-### No System Prompt
-Both models respond without personality constraints when this option is selected.
-
-## Streaming Technology
-
-The interface uses **Server-Sent Events (SSE)** for real-time streaming:
-- Backend sends chunked responses from llama.cpp
-- Frontend receives and displays chunks as they arrive
-- Smooth, ChatGPT-like experience
-- Works with both text and vision models
-
-## UI/UX Features
-
-### Message Styling
-- **User messages**: Green accent, right-aligned feel
-- **Assistant messages**: Blue accent, left-aligned feel
-- **Error messages**: Red accent with error icon
-- **Fade-in animation**: Smooth appearance for new messages
-
-### Responsive Design
-- Chat container scrolls automatically
-- Image preview for vision model
-- Loading states on buttons
-- Typing indicators
-- Custom scrollbar styling
-
-### Keyboard Shortcuts
-- **Ctrl+Enter**: Send message quickly
-- **Tab**: Navigate between input fields
-
-## Configuration Options
-
-All settings are preserved during the chat session:
-- Model type (text/vision)
-- System prompt toggle (Miku/Raw)
-- Uploaded image (for vision model)
-
-Settings do NOT persist after page refresh (fresh session each time).
-
-## Error Handling
-
-The interface handles various errors gracefully:
-- Connection failures
-- Model errors
-- Invalid image files
-- Empty messages
-- Timeout issues
-
-All errors are displayed in the chat with clear error messages.
-
-## Performance Considerations
-
-### Text Model
-- Fast responses (typically 1-3 seconds)
-- Streaming starts almost immediately
-- Low latency
-
-### Vision Model
-- Slower due to image processing
-- First token may take 3-10 seconds
-- Streaming continues once started
-- Image is sent as base64 (efficient)
-
-## Development Notes
-
-### File Changes
-1. **`bot/api.py`**
-   - Added `from fastapi.responses import StreamingResponse`
-   - Added `ChatMessage` Pydantic model
-   - Added `POST /chat/stream` endpoint with SSE support
-
-2. **`bot/static/index.html`**
-   - Added tab6 button in navigation
-   - Added complete chat interface HTML
-   - Added CSS styles for chat messages and animations
-   - Added JavaScript functions for chat functionality
-
-### Dependencies
-- Uses existing `aiohttp` for HTTP streaming
-- Uses existing `globals.TEXT_MODEL` and `globals.VISION_MODEL`
-- Uses existing `globals.LLAMA_URL` for llama.cpp connection
-- No new dependencies required!
-
-## Future Enhancements (Ideas)
-
-Potential improvements for future versions:
-- [ ] Save/load chat sessions
-- [ ] Export chat history to file
-- [ ] Multi-user chat history (separate sessions per user)
-- [ ] Temperature and max_tokens controls
-- [ ] Model selection dropdown (if multiple models available)
-- [ ] Token count display
-- [ ] Voice input support
-- [ ] Markdown rendering in responses
-- [ ] Code syntax highlighting
-- [ ] Copy message button
-- [ ] Regenerate response button
-
-## Troubleshooting
-
-### "No response received from LLM"
-- Check if llama.cpp server is running
-- Verify `LLAMA_URL` in globals is correct
-- Check bot logs for connection errors
-
-### "Failed to read image file"
-- Ensure image is valid format (JPEG, PNG, GIF)
-- Check file size (large images may cause issues)
-- Try a different image
-
-### Streaming not working
-- Check browser console for JavaScript errors
-- Verify SSE is not blocked by proxy/firewall
-- Try refreshing the page
-
-### Model not responding
-- Check if correct model is loaded in llama.cpp
-- Verify model type matches what's configured
-- Check llama.cpp logs for errors
-
-## API Reference
-
-### POST /chat/stream
-
-**Request Body:**
-```json
-{
-  "message": "string",          // Required: User's message
-  "model_type": "text|vision",  // Required: Which model to use
-  "use_system_prompt": boolean, // Required: Whether to add system prompt
-  "image_data": "string|null"   // Optional: Base64 image for vision model
-}
-```
-
-**Response:**
-```
-Content-Type: text/event-stream
-
-data: {"content": "Hello"}
-data: {"content": " there"}
-data: {"content": "!"}
-data: {"done": true}
-```
-
-**Error Response:**
-```
-data: {"error": "Error message here"}
-```
-
-## Conclusion
-
-The Chat Interface provides a powerful, user-friendly way to:
-- Test LLM responses interactively
-- Experiment with different prompting strategies
-- Analyze images with vision models
-- Chat with Miku's personality in real-time
-- Debug and understand model behavior
-
-All with a smooth, modern streaming interface that feels like ChatGPT! 🎉
diff --git a/CHAT_QUICK_START.md b/CHAT_QUICK_START.md
deleted file mode 100644
index 48dae12..0000000
--- a/CHAT_QUICK_START.md
+++ /dev/null
@@ -1,148 +0,0 @@
-# Chat Interface - Quick Start Guide
-
-## 🚀 Quick Start
-
-### Access the Chat Interface
-1. Open the Miku Control Panel in your browser
-2. Click on the **"💬 Chat with LLM"** tab
-3. Start chatting!
-
-## 📋 Configuration Options
-
-### Model Selection
-- **💬 Text Model**: Fast text conversations
-- **👁️ Vision Model**: Image analysis
-
-### System Prompt
-- **✅ Use Miku Personality**: Chat with Miku's character
-- **❌ Raw LLM**: Direct LLM without personality
-
-## 💡 Common Use Cases
-
-### 1. Chat with Miku
-```
-Model: Text Model
-System Prompt: Use Miku Personality
-Message: "Hi Miku! How are you feeling today?"
-```
-
-### 2. Test Raw LLM
-```
-Model: Text Model
-System Prompt: Raw LLM
-Message: "Explain quantum physics"
-```
-
-### 3. Analyze Images with Miku
-```
-Model: Vision Model
-System Prompt: Use Miku Personality
-Upload: [your image]
-Message: "What do you think of this image?"
-```
-
-### 4. Raw Image Analysis
-```
-Model: Vision Model
-System Prompt: Raw LLM
-Upload: [your image]
-Message: "Describe this image in detail"
-```
-
-## ⌨️ Keyboard Shortcuts
-- **Ctrl+Enter**: Send message
-
-## 🎨 Features
-- ✅ Real-time streaming (like ChatGPT)
-- ✅ Image upload for vision model
-- ✅ Color-coded messages
-- ✅ Timestamps
-- ✅ Typing indicators
-- ✅ Auto-scroll
-- ✅ Clear chat history
-
-## 🔧 System Prompts
-
-### Text Model with Miku
-- Full Miku personality
-- Current mood awareness
-- Character consistency
-
-### Vision Model with Miku
-- Miku analyzing images
-- Cheerful, playful descriptions
-
-### No System Prompt
-- Direct LLM responses
-- No character constraints
-
-## 📊 Message Types
-
-### User Messages (Green)
-- Your input
-- Right-aligned appearance
-
-### Assistant Messages (Blue)
-- Miku/LLM responses
-- Left-aligned appearance
-- Streams in real-time
-
-### Error Messages (Red)
-- Connection errors
-- Model errors
-- Clear error descriptions
-
-## 🎯 Tips
-
-1. **Use Ctrl+Enter** for quick sending
-2. **Select model first** before uploading images
-3. **Clear history** to start fresh conversations
-4. **Toggle system prompt** to compare responses
-5. **Wait for streaming** to complete before sending next message
-
-## 🐛 Troubleshooting
-
-### No response?
-- Check if llama.cpp is running
-- Verify network connection
-- Check browser console
-
-### Image not working?
-- Switch to Vision Model
-- Use valid image format (JPG, PNG)
-- Check file size
-
-### Slow responses?
-- Vision model is slower than text
-- Wait for streaming to complete
-- Check llama.cpp load
-
-## 📝 Examples
-
-### Example 1: Personality Test
-**With Miku Personality:**
-> User: "What's your favorite song?"
-> Miku: "Oh, I love so many songs! But if I had to choose, I'd say 'World is Mine' holds a special place in my heart! It really captures that fun, playful energy that I love! ✨"
-
-**Without System Prompt:**
-> User: "What's your favorite song?"
-> LLM: "I don't have personal preferences as I'm an AI language model..."
-
-### Example 2: Image Analysis
-**With Miku Personality:**
-> User: [uploads sunset image] "What do you see?"
-> Miku: "Wow! What a beautiful sunset! The sky is painted with such gorgeous oranges and pinks! It makes me want to write a song about it! The way the colors blend together is so dreamy and romantic~ 🌅💕"
-
-**Without System Prompt:**
-> User: [uploads sunset image] "What do you see?"
-> LLM: "This image shows a sunset landscape. The sky displays orange and pink hues. The sun is setting on the horizon. There are silhouettes of trees in the foreground."
-
-## 🎉 Enjoy Chatting!
-
-Have fun experimenting with different combinations of:
-- Text vs Vision models
-- With vs Without system prompts
-- Different types of questions
-- Various images (for vision model)
-
-The streaming interface makes it feel just like ChatGPT! 🚀
diff --git a/CLI_README.md b/CLI_README.md
deleted file mode 100644
index d2b66f5..0000000
--- a/CLI_README.md
+++ /dev/null
@@ -1,347 +0,0 @@
-# Miku CLI - Command Line Interface
-
-A powerful command-line interface for controlling and monitoring the Miku Discord bot.
-
-## Installation
-
-1. Make the script executable:
-```bash
-chmod +x miku-cli.py
-```
-
-2. Install dependencies:
-```bash
-pip install requests
-```
-
-3. (Optional) Create a symlink for easier access:
-```bash
-sudo ln -s $(pwd)/miku-cli.py /usr/local/bin/miku
-```
-
-## Quick Start
-
-```bash
-# Check bot status
-./miku-cli.py status
-
-# Get current mood
-./miku-cli.py mood --get
-
-# Set mood to bubbly
-./miku-cli.py mood --set bubbly
-
-# List available moods
-./miku-cli.py mood --list
-
-# Trigger autonomous message
-./miku-cli.py autonomous general
-
-# List servers
-./miku-cli.py servers
-
-# View logs
-./miku-cli.py logs
-```
-
-## Configuration
-
-By default, the CLI connects to `http://localhost:3939`. To use a different URL:
-
-```bash
-./miku-cli.py --url http://your-server:3939 status
-```
-
-## Commands
-
-### Status & Information
-
-```bash
-# Get bot status
-./miku-cli.py status
-
-# View recent logs
-./miku-cli.py logs
-
-# Get last LLM prompt
-./miku-cli.py prompt
-```
-
-### Mood Management
-
-```bash
-# Get current DM mood
-./miku-cli.py mood --get
-
-# Get server mood
-./miku-cli.py mood --get --server 123456789
-
-# Set mood
-./miku-cli.py mood --set bubbly
-./miku-cli.py mood --set excited --server 123456789
-
-# Reset mood to neutral
-./miku-cli.py mood --reset
-./miku-cli.py mood --reset --server 123456789
-
-# List available moods
-./miku-cli.py mood --list
-```
-
-### Sleep Management
-
-```bash
-# Put Miku to sleep
-./miku-cli.py sleep
-
-# Wake Miku up
-./miku-cli.py wake
-
-# Send bedtime reminder
-./miku-cli.py bedtime
-./miku-cli.py bedtime --server 123456789
-```
-
-### Autonomous Actions
-
-```bash
-# Trigger general autonomous message
-./miku-cli.py autonomous general
-./miku-cli.py autonomous general --server 123456789
-
-# Trigger user engagement
-./miku-cli.py autonomous engage
-./miku-cli.py autonomous engage --server 123456789
-
-# Share a tweet
-./miku-cli.py autonomous tweet
-./miku-cli.py autonomous tweet --server 123456789
-
-# Trigger reaction
-./miku-cli.py autonomous reaction
-./miku-cli.py autonomous reaction --server 123456789
-
-# Send custom autonomous message
-./miku-cli.py autonomous custom --prompt "Tell a joke about programming"
-./miku-cli.py autonomous custom --prompt "Say hello" --server 123456789
-
-# Get autonomous stats
-./miku-cli.py autonomous stats
-```
-
-### Server Management
-
-```bash
-# List all configured servers
-./miku-cli.py servers
-```
-
-### DM Management
-
-```bash
-# List users with DM history
-./miku-cli.py dm-users
-
-# Send custom DM (LLM-generated)
-./miku-cli.py dm-custom 123456789 "Ask them how their day was"
-
-# Send manual DM (direct message)
-./miku-cli.py dm-manual 123456789 "Hello! How are you?"
-
-# Block a user
-./miku-cli.py block 123456789
-
-# Unblock a user
-./miku-cli.py unblock 123456789
-
-# List blocked users
-./miku-cli.py blocked-users
-```
-
-### Profile Picture
-
-```bash
-# Change profile picture (search Danbooru based on mood)
-./miku-cli.py change-pfp
-
-# Change to custom image
-./miku-cli.py change-pfp --image /path/to/image.png
-
-# Change for specific server mood
-./miku-cli.py change-pfp --server 123456789
-
-# Get current profile picture metadata
-./miku-cli.py pfp-metadata
-```
-
-### Conversation Management
-
-```bash
-# Reset conversation history for a user
-./miku-cli.py reset-conversation 123456789
-```
-
-### Manual Messaging
-
-```bash
-# Send message to channel
-./miku-cli.py send 987654321 "Hello everyone!"
-
-# Send message with file attachments
-./miku-cli.py send 987654321 "Check this out!" --files image.png document.pdf
-```
-
-## Available Moods
-
-- 😊 neutral
-- 🥰 bubbly
-- 🤩 excited
-- 😴 sleepy
-- 😡 angry
-- 🙄 irritated
-- 😏 flirty
-- 💕 romantic
-- 🤔 curious
-- 😳 shy
-- 🤪 silly
-- 😢 melancholy
-- 😤 serious
-- 💤 asleep
-
-## Examples
-
-### Morning Routine
-```bash
-# Wake up Miku
-./miku-cli.py wake
-
-# Set a bubbly mood
-./miku-cli.py mood --set bubbly
-
-# Send a general message to all servers
-./miku-cli.py autonomous general
-
-# Change profile picture to match mood
-./miku-cli.py change-pfp
-```
-
-### Server-Specific Control
-```bash
-# Get server list
-./miku-cli.py servers
-
-# Set mood for specific server
-./miku-cli.py mood --set excited --server 123456789
-
-# Trigger engagement on that server
-./miku-cli.py autonomous engage --server 123456789
-```
-
-### DM Interaction
-```bash
-# List users
-./miku-cli.py dm-users
-
-# Send custom message
-./miku-cli.py dm-custom 123456789 "Ask them about their favorite anime"
-
-# If user is spamming, block them
-./miku-cli.py block 123456789
-```
-
-### Monitoring
-```bash
-# Check status
-./miku-cli.py status
-
-# View logs
-./miku-cli.py logs
-
-# Get autonomous stats
-./miku-cli.py autonomous stats
-
-# Check last prompt
-./miku-cli.py prompt
-```
-
-## Output Format
-
-The CLI uses emoji and colored output for better readability:
-
-- ✅ Success messages
-- ❌ Error messages
-- 😊 Mood indicators
-- 🌐 Server information
-- 💬 DM information
-- 📊 Statistics
-- 🖼️ Media information
-
-## Scripting
-
-The CLI is designed to be script-friendly:
-
-```bash
-#!/bin/bash
-
-# Morning routine script
-./miku-cli.py wake
-./miku-cli.py mood --set bubbly
-./miku-cli.py autonomous general
-
-# Wait 5 minutes
-sleep 300
-
-# Engage users
-./miku-cli.py autonomous engage
-```
-
-## Error Handling
-
-The CLI exits with status code 1 on errors and 0 on success, making it suitable for use in scripts:
-
-```bash
-if ./miku-cli.py mood --set bubbly; then
-    echo "Mood set successfully"
-else
-    echo "Failed to set mood"
-fi
-```
-
-## API Reference
-
-For complete API documentation, see [API_REFERENCE.md](./API_REFERENCE.md).
-
-## Troubleshooting
-
-### Connection Refused
-If you get "Connection refused" errors:
-1. Check that the bot API is running on port 3939
-2. Verify the URL with `--url` parameter
-3. Check Docker container status: `docker-compose ps`
-
-### Permission Denied
-Make the script executable:
-```bash
-chmod +x miku-cli.py
-```
-
-### Import Errors
-Install required dependencies:
-```bash
-pip install requests
-```
-
-## Future Enhancements
-
-Planned features:
-- Configuration file support (~/.miku-cli.conf)
-- Interactive mode
-- Tab completion
-- Color output control
-- JSON output mode for scripting
-- Batch operations
-- Watch mode for real-time monitoring
-
-## Contributing
-
-Feel free to extend the CLI with additional commands and features!
diff --git a/DUAL_GPU_BUILD_SUMMARY.md b/DUAL_GPU_BUILD_SUMMARY.md
deleted file mode 100644
index acf7430..0000000
--- a/DUAL_GPU_BUILD_SUMMARY.md
+++ /dev/null
@@ -1,184 +0,0 @@
-# Dual GPU Setup Summary
-
-## What We Built
-
-A secondary llama-swap container optimized for your **AMD RX 6800** GPU using ROCm.
-
-### Architecture
-
-```
-Primary GPU (NVIDIA GTX 1660)     Secondary GPU (AMD RX 6800)
-         ↓                                    ↓
-   llama-swap (CUDA)                  llama-swap-amd (ROCm)
-   Port: 8090                         Port: 8091
-         ↓                                    ↓
-   NVIDIA models                       AMD models
-   - llama3.1                         - llama3.1-amd
-   - darkidol                         - darkidol-amd
-   - vision (MiniCPM)                 - moondream-amd
-```
-
-## Files Created
-
-1. **Dockerfile.llamaswap-rocm** - Custom multi-stage build:
-   - Stage 1: Builds llama.cpp with ROCm from source
-   - Stage 2: Builds llama-swap from source
-   - Stage 3: Runtime image with both binaries
-
-2. **llama-swap-rocm-config.yaml** - Model configuration for AMD GPU
-
-3. **docker-compose.yml** - Updated with `llama-swap-amd` service
-
-4. **bot/utils/gpu_router.py** - Load balancing utility
-
-5. **bot/globals.py** - Updated with `LLAMA_AMD_URL`
-
-6. **setup-dual-gpu.sh** - Setup verification script
-
-7. **DUAL_GPU_SETUP.md** - Comprehensive documentation
-
-8. **DUAL_GPU_QUICK_REF.md** - Quick reference guide
-
-## Why Custom Build?
-
-- llama.cpp doesn't publish ROCm Docker images (yet)
-- llama-swap doesn't provide ROCm variants
-- Building from source ensures latest ROCm compatibility
-- Full control over compilation flags and optimization
-
-## Build Time
-
-The initial build takes 15-30 minutes depending on your system:
-- llama.cpp compilation: ~10-20 minutes
-- llama-swap compilation: ~1-2 minutes
-- Image layering: ~2-5 minutes
-
-Subsequent builds are much faster due to Docker layer caching.
-
-## Next Steps
-
-Once the build completes:
-
-```bash
-# 1. Start both GPU services
-docker compose up -d llama-swap llama-swap-amd
-
-# 2. Verify both are running
-docker compose ps
-
-# 3. Test NVIDIA GPU
-curl http://localhost:8090/health
-
-# 4. Test AMD GPU
-curl http://localhost:8091/health
-
-# 5. Monitor logs
-docker compose logs -f llama-swap-amd
-
-# 6. Test model loading on AMD
-curl -X POST http://localhost:8091/v1/chat/completions \
-  -H "Content-Type: application/json" \
-  -d '{
-    "model": "llama3.1-amd",
-    "messages": [{"role": "user", "content": "Hello!"}],
-    "max_tokens": 50
-  }'
-```
-
-## Device Access
-
-The AMD container has access to:
-- `/dev/kfd` - AMD GPU kernel driver
-- `/dev/dri` - Direct Rendering Infrastructure
-- Groups: `video`, `render`
-
-## Environment Variables
-
-RX 6800 specific settings:
-```yaml
-HSA_OVERRIDE_GFX_VERSION=10.3.0  # Navi 21 (gfx1030) compatibility
-ROCM_PATH=/opt/rocm
-HIP_VISIBLE_DEVICES=0            # Use first AMD GPU
-```
-
-## Bot Integration
-
-Your bot now has two endpoints available:
-
-```python
-import globals
-
-# NVIDIA GPU (primary)
-nvidia_url = globals.LLAMA_URL  # http://llama-swap:8080
-
-# AMD GPU (secondary)
-amd_url = globals.LLAMA_AMD_URL  # http://llama-swap-amd:8080
-```
-
-Use the `gpu_router` utility for automatic load balancing:
-
-```python
-from bot.utils.gpu_router import get_llama_url_with_load_balancing
-
-# Round-robin between GPUs
-url, model = get_llama_url_with_load_balancing(task_type="text")
-
-# Prefer AMD for vision
-url, model = get_llama_url_with_load_balancing(
-    task_type="vision",
-    prefer_amd=True
-)
-```
-
-## Troubleshooting
-
-If the AMD container fails to start:
-
-1. **Check build logs:**
-   ```bash
-   docker compose build --no-cache llama-swap-amd
-   ```
-
-2. **Verify GPU access:**
-   ```bash
-   ls -l /dev/kfd /dev/dri
-   ```
-
-3. **Check container logs:**
-   ```bash
-   docker compose logs llama-swap-amd
-   ```
-
-4. **Test GPU from host:**
-   ```bash
-   lspci | grep -i amd
-   # Should show: Radeon RX 6800
-   ```
-
-## Performance Notes
-
-**RX 6800 Specs:**
-- VRAM: 16GB
-- Architecture: RDNA 2 (Navi 21)
-- Compute: gfx1030
-
-**Recommended Models:**
-- Q4_K_M quantization: 5-6GB per model
-- Can load 2-3 models simultaneously
-- Good for: Llama 3.1 8B, DarkIdol 8B, Moondream2
-
-## Future Improvements
-
-1. **Automatic failover:** Route to AMD if NVIDIA is busy
-2. **Health monitoring:** Track GPU utilization
-3. **Dynamic routing:** Use least-busy GPU
-4. **VRAM monitoring:** Alert before OOM
-5. **Model preloading:** Keep common models loaded
-
-## Resources
-
-- [ROCm Documentation](https://rocmdocs.amd.com/)
-- [llama.cpp ROCm Build](https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#rocm)
-- [llama-swap GitHub](https://github.com/mostlygeek/llama-swap)
-- [Full Setup Guide](./DUAL_GPU_SETUP.md)
-- [Quick Reference](./DUAL_GPU_QUICK_REF.md)
diff --git a/DUAL_GPU_QUICK_REF.md b/DUAL_GPU_QUICK_REF.md
deleted file mode 100644
index 0439379..0000000
--- a/DUAL_GPU_QUICK_REF.md
+++ /dev/null
@@ -1,194 +0,0 @@
-# Dual GPU Quick Reference
-
-## Quick Start
-
-```bash
-# 1. Run setup check
-./setup-dual-gpu.sh
-
-# 2. Build AMD container
-docker compose build llama-swap-amd
-
-# 3. Start both GPUs
-docker compose up -d llama-swap llama-swap-amd
-
-# 4. Verify
-curl http://localhost:8090/health  # NVIDIA
-curl http://localhost:8091/health  # AMD RX 6800
-```
-
-## Endpoints
-
-| GPU | Container | Port | Internal URL |
-|-----|-----------|------|--------------|
-| NVIDIA | llama-swap | 8090 | http://llama-swap:8080 |
-| AMD RX 6800 | llama-swap-amd | 8091 | http://llama-swap-amd:8080 |
-
-## Models
-
-### NVIDIA GPU (Primary)
-- `llama3.1` - Llama 3.1 8B Instruct
-- `darkidol` - DarkIdol Uncensored 8B
-- `vision` - MiniCPM-V-4.5 (4K context)
-
-### AMD RX 6800 (Secondary)
-- `llama3.1-amd` - Llama 3.1 8B Instruct
-- `darkidol-amd` - DarkIdol Uncensored 8B
-- `moondream-amd` - Moondream2 Vision (2K context)
-
-## Commands
-
-### Start/Stop
-```bash
-# Start both
-docker compose up -d llama-swap llama-swap-amd
-
-# Start only AMD
-docker compose up -d llama-swap-amd
-
-# Stop AMD
-docker compose stop llama-swap-amd
-
-# Restart AMD with logs
-docker compose restart llama-swap-amd && docker compose logs -f llama-swap-amd
-```
-
-### Monitoring
-```bash
-# Container status
-docker compose ps
-
-# Logs
-docker compose logs -f llama-swap-amd
-
-# GPU usage
-watch -n 1 nvidia-smi  # NVIDIA
-watch -n 1 rocm-smi    # AMD
-
-# Resource usage
-docker stats llama-swap llama-swap-amd
-```
-
-### Testing
-```bash
-# List available models
-curl http://localhost:8091/v1/models | jq
-
-# Test text generation (AMD)
-curl -X POST http://localhost:8091/v1/chat/completions \
-  -H "Content-Type: application/json" \
-  -d '{
-    "model": "llama3.1-amd",
-    "messages": [{"role": "user", "content": "Say hello!"}],
-    "max_tokens": 20
-  }' | jq
-
-# Test vision model (AMD)
-curl -X POST http://localhost:8091/v1/chat/completions \
-  -H "Content-Type: application/json" \
-  -d '{
-    "model": "moondream-amd",
-    "messages": [{
-      "role": "user",
-      "content": [
-        {"type": "text", "text": "Describe this image"},
-        {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}}
-      ]
-    }],
-    "max_tokens": 100
-  }' | jq
-```
-
-## Bot Integration
-
-### Using GPU Router
-```python
-from bot.utils.gpu_router import get_llama_url_with_load_balancing, get_endpoint_for_model
-
-# Load balanced text generation
-url, model = get_llama_url_with_load_balancing(task_type="text")
-
-# Specific model
-url = get_endpoint_for_model("darkidol-amd")
-
-# Vision on AMD
-url, model = get_llama_url_with_load_balancing(task_type="vision", prefer_amd=True)
-```
-
-### Direct Access
-```python
-import globals
-
-# AMD GPU
-amd_url = globals.LLAMA_AMD_URL  # http://llama-swap-amd:8080
-
-# NVIDIA GPU  
-nvidia_url = globals.LLAMA_URL   # http://llama-swap:8080
-```
-
-## Troubleshooting
-
-### AMD Container Won't Start
-```bash
-# Check ROCm
-rocm-smi
-
-# Check permissions
-ls -l /dev/kfd /dev/dri
-
-# Check logs
-docker compose logs llama-swap-amd
-
-# Rebuild
-docker compose build --no-cache llama-swap-amd
-```
-
-### Model Won't Load
-```bash
-# Check VRAM
-rocm-smi --showmeminfo vram
-
-# Lower GPU layers in llama-swap-rocm-config.yaml
-# Change: -ngl 99
-# To:     -ngl 50
-```
-
-### GFX Version Error
-```bash
-# RX 6800 is gfx1030
-# Ensure in docker-compose.yml:
-HSA_OVERRIDE_GFX_VERSION=10.3.0
-```
-
-## Environment Variables
-
-Add to `docker-compose.yml` under `miku-bot` service:
-
-```yaml
-environment:
-  - PREFER_AMD_GPU=true          # Prefer AMD for load balancing
-  - AMD_MODELS_ENABLED=true      # Enable AMD models
-  - LLAMA_AMD_URL=http://llama-swap-amd:8080
-```
-
-## Files
-
-- `Dockerfile.llamaswap-rocm` - ROCm container
-- `llama-swap-rocm-config.yaml` - AMD model config
-- `bot/utils/gpu_router.py` - Load balancing utility
-- `DUAL_GPU_SETUP.md` - Full documentation
-- `setup-dual-gpu.sh` - Setup verification script
-
-## Performance Tips
-
-1. **Model Selection**: Use Q4_K quantization for best size/quality balance
-2. **VRAM**: RX 6800 has 16GB - can run 2-3 Q4 models
-3. **TTL**: Adjust in config files (1800s = 30min default)
-4. **Context**: Lower context size (`-c 8192`) to save VRAM
-5. **GPU Layers**: `-ngl 99` uses full GPU, lower if needed
-
-## Support
-
-- ROCm Docs: https://rocmdocs.amd.com/
-- llama.cpp: https://github.com/ggml-org/llama.cpp
-- llama-swap: https://github.com/mostlygeek/llama-swap
diff --git a/DUAL_GPU_SETUP.md b/DUAL_GPU_SETUP.md
deleted file mode 100644
index 9ac9749..0000000
--- a/DUAL_GPU_SETUP.md
+++ /dev/null
@@ -1,321 +0,0 @@
-# Dual GPU Setup - NVIDIA + AMD RX 6800
-
-This document describes the dual-GPU configuration for running two llama-swap instances simultaneously:
-- **Primary GPU (NVIDIA)**: Runs main models via CUDA
-- **Secondary GPU (AMD RX 6800)**: Runs additional models via ROCm
-
-## Architecture Overview
-
-```
-┌─────────────────────────────────────────────────────────────┐
-│                         Miku Bot                            │
-│                                                             │
-│  LLAMA_URL=http://llama-swap:8080 (NVIDIA)                │
-│  LLAMA_AMD_URL=http://llama-swap-amd:8080 (AMD RX 6800)   │
-└─────────────────────────────────────────────────────────────┘
-                    │                      │
-                    │                      │
-                    ▼                      ▼
-        ┌──────────────────┐    ┌──────────────────┐
-        │  llama-swap      │    │  llama-swap-amd  │
-        │  (CUDA)          │    │  (ROCm)          │
-        │  Port: 8090      │    │  Port: 8091      │
-        └──────────────────┘    └──────────────────┘
-                    │                      │
-                    ▼                      ▼
-        ┌──────────────────┐    ┌──────────────────┐
-        │  NVIDIA GPU      │    │  AMD RX 6800     │
-        │  - llama3.1      │    │  - llama3.1-amd  │
-        │  - darkidol      │    │  - darkidol-amd  │
-        │  - vision        │    │  - moondream-amd │
-        └──────────────────┘    └──────────────────┘
-```
-
-## Files Created
-
-1. **Dockerfile.llamaswap-rocm** - ROCm-enabled Docker image for AMD GPU
-2. **llama-swap-rocm-config.yaml** - Model configuration for AMD models
-3. **docker-compose.yml** - Updated with `llama-swap-amd` service
-
-## Configuration Details
-
-### llama-swap-amd Service
-
-```yaml
-llama-swap-amd:
-  build:
-    context: .
-    dockerfile: Dockerfile.llamaswap-rocm
-  container_name: llama-swap-amd
-  ports:
-    - "8091:8080"  # External access on port 8091
-  volumes:
-    - ./models:/models
-    - ./llama-swap-rocm-config.yaml:/app/config.yaml
-  devices:
-    - /dev/kfd:/dev/kfd    # AMD GPU kernel driver
-    - /dev/dri:/dev/dri    # Direct Rendering Infrastructure
-  group_add:
-    - video
-    - render
-  environment:
-    - HSA_OVERRIDE_GFX_VERSION=10.3.0  # RX 6800 (Navi 21) compatibility
-```
-
-### Available Models on AMD GPU
-
-From `llama-swap-rocm-config.yaml`:
-
-- **llama3.1-amd** - Llama 3.1 8B text model
-- **darkidol-amd** - DarkIdol uncensored model  
-- **moondream-amd** - Moondream2 vision model (smaller, AMD-optimized)
-
-### Model Aliases
-
-You can access AMD models using these aliases:
-- `llama3.1-amd`, `text-model-amd`, `amd-text`
-- `darkidol-amd`, `evil-model-amd`, `uncensored-amd`
-- `moondream-amd`, `vision-amd`, `moondream`
-
-## Usage
-
-### Building and Starting Services
-
-```bash
-# Build the AMD ROCm container
-docker compose build llama-swap-amd
-
-# Start both GPU services
-docker compose up -d llama-swap llama-swap-amd
-
-# Check logs
-docker compose logs -f llama-swap-amd
-```
-
-### Accessing AMD Models from Bot Code
-
-In your bot code, you can now use either endpoint:
-
-```python
-import globals
-
-# Use NVIDIA GPU (primary)
-nvidia_response = requests.post(
-    f"{globals.LLAMA_URL}/v1/chat/completions",
-    json={"model": "llama3.1", ...}
-)
-
-# Use AMD GPU (secondary)
-amd_response = requests.post(
-    f"{globals.LLAMA_AMD_URL}/v1/chat/completions", 
-    json={"model": "llama3.1-amd", ...}
-)
-```
-
-### Load Balancing Strategy
-
-You can implement load balancing by:
-
-1. **Round-robin**: Alternate between GPUs for text generation
-2. **Task-specific**: 
-   - NVIDIA: Primary text + MiniCPM vision (heavy)
-   - AMD: Secondary text + Moondream vision (lighter)
-3. **Failover**: Use AMD as backup if NVIDIA is busy
-
-Example load balancing function:
-
-```python
-import random
-import globals
-
-def get_llama_url(prefer_amd=False):
-    """Get llama URL with optional load balancing"""
-    if prefer_amd:
-        return globals.LLAMA_AMD_URL
-    
-    # Random load balancing for text models
-    return random.choice([globals.LLAMA_URL, globals.LLAMA_AMD_URL])
-```
-
-## Testing
-
-### Test NVIDIA GPU (Port 8090)
-```bash
-curl http://localhost:8090/health
-curl http://localhost:8090/v1/models
-```
-
-### Test AMD GPU (Port 8091)
-```bash
-curl http://localhost:8091/health
-curl http://localhost:8091/v1/models
-```
-
-### Test Model Loading (AMD)
-```bash
-curl -X POST http://localhost:8091/v1/chat/completions \
-  -H "Content-Type: application/json" \
-  -d '{
-    "model": "llama3.1-amd",
-    "messages": [{"role": "user", "content": "Hello from AMD GPU!"}],
-    "max_tokens": 50
-  }'
-```
-
-## Monitoring
-
-### Check GPU Usage
-
-**AMD GPU:**
-```bash
-# ROCm monitoring
-rocm-smi
-
-# Or from host
-watch -n 1 rocm-smi
-```
-
-**NVIDIA GPU:**
-```bash
-nvidia-smi
-watch -n 1 nvidia-smi
-```
-
-### Check Container Resource Usage
-```bash
-docker stats llama-swap llama-swap-amd
-```
-
-## Troubleshooting
-
-### AMD GPU Not Detected
-
-1. Verify ROCm is installed on host:
-   ```bash
-   rocm-smi --version
-   ```
-
-2. Check device permissions:
-   ```bash
-   ls -l /dev/kfd /dev/dri
-   ```
-
-3. Verify RX 6800 compatibility:
-   ```bash
-   rocminfo | grep "Name:"
-   ```
-
-### Model Loading Issues
-
-If models fail to load on AMD:
-
-1. Check VRAM availability:
-   ```bash
-   rocm-smi --showmeminfo vram
-   ```
-
-2. Adjust `-ngl` (GPU layers) in config if needed:
-   ```yaml
-   # Reduce GPU layers for smaller VRAM
-   cmd: /app/llama-server ... -ngl 50 ...  # Instead of 99
-   ```
-
-3. Check container logs:
-   ```bash
-   docker compose logs llama-swap-amd
-   ```
-
-### GFX Version Mismatch
-
-RX 6800 is Navi 21 (gfx1030). If you see GFX errors:
-
-```bash
-# Set in docker-compose.yml environment:
-HSA_OVERRIDE_GFX_VERSION=10.3.0
-```
-
-### llama-swap Build Issues
-
-If the ROCm container fails to build:
-
-1. The Dockerfile attempts to build llama-swap from source
-2. Alternative: Use pre-built binary or simpler proxy setup
-3. Check build logs: `docker compose build --no-cache llama-swap-amd`
-
-## Performance Considerations
-
-### Memory Usage
-
-- **RX 6800**: 16GB VRAM
-  - Q4_K_M/Q4_K_XL models: ~5-6GB each
-  - Can run 2 models simultaneously or 1 with long context
-
-### Model Selection
-
-**Best for AMD RX 6800:**
-- ✅ Q4_K_M/Q4_K_S quantized models (5-6GB)
-- ✅ Moondream2 vision (smaller, efficient)
-- ⚠️ MiniCPM-V-4.5 (possible but may be tight on VRAM)
-
-### TTL Configuration
-
-Adjust model TTL in `llama-swap-rocm-config.yaml`:
-- Lower TTL = more aggressive unloading = more VRAM available
-- Higher TTL = less model swapping = faster response times
-
-## Advanced: Model-Specific Routing
-
-Create a helper function to route models automatically:
-
-```python
-# bot/utils/gpu_router.py
-import globals
-
-MODEL_TO_GPU = {
-    # NVIDIA models
-    "llama3.1": globals.LLAMA_URL,
-    "darkidol": globals.LLAMA_URL,
-    "vision": globals.LLAMA_URL,
-    
-    # AMD models
-    "llama3.1-amd": globals.LLAMA_AMD_URL,
-    "darkidol-amd": globals.LLAMA_AMD_URL,
-    "moondream-amd": globals.LLAMA_AMD_URL,
-}
-
-def get_endpoint_for_model(model_name):
-    """Get the correct llama-swap endpoint for a model"""
-    return MODEL_TO_GPU.get(model_name, globals.LLAMA_URL)
-
-def is_amd_model(model_name):
-    """Check if model runs on AMD GPU"""
-    return model_name.endswith("-amd")
-```
-
-## Environment Variables
-
-Add these to control GPU selection:
-
-```yaml
-# In docker-compose.yml
-environment:
-  - LLAMA_URL=http://llama-swap:8080
-  - LLAMA_AMD_URL=http://llama-swap-amd:8080
-  - PREFER_AMD_GPU=false  # Set to true to prefer AMD for general tasks
-  - AMD_MODELS_ENABLED=true  # Enable/disable AMD models
-```
-
-## Future Enhancements
-
-1. **Automatic load balancing**: Monitor GPU utilization and route requests
-2. **Health checks**: Fallback to primary GPU if AMD fails
-3. **Model distribution**: Automatically assign models to GPUs based on VRAM
-4. **Performance metrics**: Track response times per GPU
-5. **Dynamic routing**: Use least-busy GPU for new requests
-
-## References
-
-- [ROCm Documentation](https://rocmdocs.amd.com/)
-- [llama.cpp ROCm Support](https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#rocm)
-- [llama-swap GitHub](https://github.com/mostlygeek/llama-swap)
-- [AMD GPU Compatibility Matrix](https://rocm.docs.amd.com/en/latest/release/gpu_os_support.html)
diff --git a/ERROR_HANDLING_QUICK_REF.md b/ERROR_HANDLING_QUICK_REF.md
deleted file mode 100644
index 6a9342e..0000000
--- a/ERROR_HANDLING_QUICK_REF.md
+++ /dev/null
@@ -1,78 +0,0 @@
-# Error Handling Quick Reference
-
-## What Changed
-
-When Miku encounters an error (like "Error 502" from llama-swap), she now says:
-```
-"Someone tell Koko-nii there is a problem with my AI."
-```
-
-And sends you a webhook notification with full error details.
-
-## Webhook Details
-
-**Webhook URL**: `https://discord.com/api/webhooks/1462216811293708522/...`
-**Mentions**: @Koko-nii (User ID: 344584170839236608)
-
-## Error Notification Format
-
-```
-🚨 Miku Bot Error
-━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-
-Error Message:
-  Error: 502
-
-User: username#1234
-Channel: #general
-Server: Guild ID: 123456789
-User Prompt:
-  Hi Miku! How are you?
-
-Exception Type: HTTPError
-Traceback:
-  [Full Python traceback]
-```
-
-## Files Changed
-
-1. **NEW**: `bot/utils/error_handler.py`
-   - Main error handling logic
-   - Webhook notifications
-   - Error detection
-
-2. **MODIFIED**: `bot/utils/llm.py`
-   - Added error handling to `query_llama()`
-   - Prevents errors in conversation history
-   - Catches all exceptions and HTTP errors
-
-3. **NEW**: `bot/test_error_handler.py`
-   - Test suite for error detection
-   - 26 test cases
-
-4. **NEW**: `ERROR_HANDLING_SYSTEM.md`
-   - Full documentation
-
-## Testing
-
-```bash
-cd /home/koko210Serve/docker/miku-discord/bot
-python test_error_handler.py
-```
-
-Expected: ✓ All 26 tests passed!
-
-## Coverage
-
-✅ Works with both llama-swap (NVIDIA) and llama-swap-rocm (AMD)
-✅ Handles all message types (DMs, server messages, autonomous)
-✅ Catches connection errors, timeouts, HTTP errors
-✅ Prevents errors from polluting conversation history
-
-## No Changes Required
-
-No configuration changes needed. The system is automatically active for:
-- All direct messages to Miku
-- All server messages mentioning Miku
-- All autonomous messages
-- All LLM queries via `query_llama()`
diff --git a/ERROR_HANDLING_SYSTEM.md b/ERROR_HANDLING_SYSTEM.md
deleted file mode 100644
index 11b75a9..0000000
--- a/ERROR_HANDLING_SYSTEM.md
+++ /dev/null
@@ -1,131 +0,0 @@
-# Error Handling System
-
-## Overview
-
-The Miku bot now includes a comprehensive error handling system that catches errors from the llama-swap containers (both NVIDIA and AMD) and provides user-friendly responses while notifying the bot administrator.
-
-## Features
-
-### 1. Error Detection
-The system automatically detects various types of errors including:
-- HTTP error codes (502, 500, 503, etc.)
-- Connection errors (refused, timeout, failed)
-- LLM server errors
-- Timeout errors
-- Generic error messages
-
-### 2. User-Friendly Responses
-When an error is detected, instead of showing technical error messages like "Error 502" or "Sorry, there was an error", Miku will respond with:
-
-> **"Someone tell Koko-nii there is a problem with my AI."**
-
-This keeps Miku in character and provides a better user experience.
-
-### 3. Administrator Notifications
-When an error occurs, a webhook notification is automatically sent to Discord with:
-- **Error Message**: The full error text from the container
-- **Context Information**:
-  - User who triggered the error
-  - Channel/Server where the error occurred
-  - User's prompt that caused the error
-  - Exception type (if applicable)
-  - Full traceback (if applicable)
-- **Mention**: Automatically mentions Koko-nii for immediate attention
-
-### 4. Conversation History Protection
-Error messages are NOT saved to conversation history, preventing errors from polluting the context for future interactions.
-
-## Implementation Details
-
-### Files Modified
-
-1. **`bot/utils/error_handler.py`** (NEW)
-   - Core error detection and webhook notification logic
-   - `is_error_response()`: Detects error messages using regex patterns
-   - `handle_llm_error()`: Handles exceptions from the LLM
-   - `handle_response_error()`: Handles error responses from the LLM
-   - `send_error_webhook()`: Sends formatted error notifications
-
-2. **`bot/utils/llm.py`**
-   - Integrated error handling into `query_llama()` function
-   - Catches all exceptions and HTTP errors
-   - Filters responses to detect error messages
-   - Prevents error messages from being saved to history
-
-### Webhook URL
-```
-https://discord.com/api/webhooks/1462216811293708522/4kdGenpxZFsP0z3VBgebYENODKmcRrmEzoIwCN81jCirnAxuU2YvxGgwGCNBb6TInA9Z
-```
-
-## Error Detection Patterns
-
-The system detects errors using the following patterns:
-- `Error: XXX` or `Error XXX` (with HTTP status codes)
-- `XXX Error` format
-- "Sorry, there was an error"
-- "Sorry, the response took too long"
-- Connection-related errors (refused, timeout, failed)
-- Server errors (service unavailable, internal server error, bad gateway)
-- HTTP status codes >= 400
-
-## Coverage
-
-The error handler is automatically applied to:
-- ✅ Direct messages to Miku
-- ✅ Server messages mentioning Miku
-- ✅ Autonomous messages (general, engaging users, tweets)
-- ✅ Conversation joining
-- ✅ All responses using `query_llama()`
-- ✅ Both NVIDIA and AMD GPU containers
-
-## Testing
-
-A test suite is included in `bot/test_error_handler.py` that validates the error detection logic with 26 test cases covering:
-- Various error message formats
-- Normal responses (should NOT be detected as errors)
-- HTTP status codes
-- Edge cases
-
-Run tests with:
-```bash
-cd /home/koko210Serve/docker/miku-discord/bot
-python test_error_handler.py
-```
-
-## Example Scenarios
-
-### Scenario 1: llama-swap Container Down
-**User**: "Hi Miku!"
-**Without Error Handler**: "Error: 502"
-**With Error Handler**: "Someone tell Koko-nii there is a problem with my AI."
-**Webhook Notification**: Sent with full error details
-
-### Scenario 2: Connection Timeout
-**User**: "Tell me a story"
-**Without Error Handler**: "Sorry, the response took too long. Please try again."
-**With Error Handler**: "Someone tell Koko-nii there is a problem with my AI."
-**Webhook Notification**: Sent with timeout exception details
-
-### Scenario 3: LLM Server Error
-**User**: "How are you?"
-**Without Error Handler**: "Error: Internal server error"
-**With Error Handler**: "Someone tell Koko-nii there is a problem with my AI."
-**Webhook Notification**: Sent with HTTP 500 error details
-
-## Benefits
-
-1. **Better User Experience**: Users see a friendly, in-character message instead of technical errors
-2. **Immediate Notifications**: Administrator is notified immediately via Discord webhook
-3. **Detailed Context**: Full error information is provided for debugging
-4. **Clean History**: Errors don't pollute conversation history
-5. **Consistent Handling**: All error types are handled uniformly
-6. **Container Agnostic**: Works with both NVIDIA and AMD containers
-
-## Future Enhancements
-
-Potential improvements:
-- Add retry logic for transient errors
-- Track error frequency to detect systemic issues
-- Automatic container restart if errors persist
-- Error categorization (transient vs. critical)
-- Rate limiting on webhook notifications to prevent spam
diff --git a/INTERRUPTION_DETECTION.md b/INTERRUPTION_DETECTION.md
deleted file mode 100644
index f6e7ae5..0000000
--- a/INTERRUPTION_DETECTION.md
+++ /dev/null
@@ -1,311 +0,0 @@
-# Intelligent Interruption Detection System
-
-## Implementation Complete ✅
-
-Added sophisticated interruption detection that prevents response queueing and allows natural conversation flow.
-
----
-
-## Features
-
-### 1. **Intelligent Interruption Detection**
-Detects when user speaks over Miku with configurable thresholds:
-- **Time threshold**: 0.8 seconds of continuous speech
-- **Chunk threshold**: 8+ audio chunks (160ms worth)
-- **Smart calculation**: Both conditions must be met to prevent false positives
-
-### 2. **Graceful Cancellation**
-When interruption is detected:
-- ✅ Stops LLM streaming immediately (`miku_speaking = False`)
-- ✅ Cancels TTS playback
-- ✅ Flushes audio buffers
-- ✅ Ready for next input within milliseconds
-
-### 3. **History Tracking**
-Maintains conversation context:
-- Adds `[INTERRUPTED - user started speaking]` marker to history
-- **Does NOT** add incomplete response to history
-- LLM sees the interruption in context for next response
-- Prevents confusion about what was actually said
-
-### 4. **Queue Prevention**
-- If user speaks while Miku is talking **but not long enough to interrupt**:
-  - Input is **ignored** (not queued)
-  - User sees: `"(talk over Miku longer to interrupt)"`
-  - Prevents "yeah" x5 = 5 responses problem
-
----
-
-## How It Works
-
-### Detection Algorithm
-
-```
-User speaks during Miku's turn
-         ↓
-Track: start_time, chunk_count
-         ↓
-Each audio chunk increments counter
-         ↓
-Check thresholds:
-  - Duration >= 0.8s?
-  - Chunks >= 8?
-         ↓
-   Both YES → INTERRUPT!
-         ↓
-Stop LLM stream, cancel TTS, mark history
-```
-
-### Threshold Calculation
-
-**Audio chunks**: Discord sends 20ms chunks @ 16kHz (320 samples)
-- 8 chunks = 160ms of actual audio
-- But over 800ms timespan = sustained speech
-
-**Why both conditions?**
-- Time only: Background noise could trigger
-- Chunks only: Gaps in speech could fail
-- Both together: Reliable detection of intentional speech
-
----
-
-## Configuration
-
-### Interruption Thresholds
-
-Edit `bot/utils/voice_receiver.py`:
-
-```python
-# Interruption detection
-self.interruption_threshold_time = 0.8  # seconds
-self.interruption_threshold_chunks = 8  # minimum chunks
-```
-
-**Recommendations**:
-- **More sensitive** (interrupt faster): `0.5s / 6 chunks`
-- **Current** (balanced): `0.8s / 8 chunks`
-- **Less sensitive** (only clear interruptions): `1.2s / 12 chunks`
-
-### Silence Timeout
-
-The silence detection (when to finalize transcript) was also adjusted:
-
-```python
-self.silence_timeout = 1.0  # seconds (was 1.5s)
-```
-
-Faster silence detection = more responsive conversations!
-
----
-
-## Conversation History Format
-
-### Before Interruption
-```python
-[
-    {"role": "user", "content": "koko210: Tell me a long story"},
-    {"role": "assistant", "content": "Once upon a time in a digital world..."},
-]
-```
-
-### After Interruption
-```python
-[
-    {"role": "user", "content": "koko210: Tell me a long story"},
-    {"role": "assistant", "content": "[INTERRUPTED - user started speaking]"},
-    {"role": "user", "content": "koko210: Actually, tell me something else"},
-    {"role": "assistant", "content": "Sure! What would you like to hear about?"},
-]
-```
-
-The `[INTERRUPTED]` marker gives the LLM context that the conversation was cut off.
-
----
-
-## Testing Scenarios
-
-### Test 1: Basic Interruption
-1. `!miku listen`
-2. Say: "Tell me a very long story about your concerts"
-3. **While Miku is speaking**, talk over her for 1+ second
-4. **Expected**: TTS stops, LLM stops, Miku listens to your new input
-
-### Test 2: Short Talk-Over (No Interruption)
-1. Miku is speaking
-2. Say a quick "yeah" or "uh-huh" (< 0.8s)
-3. **Expected**: Ignored, Miku continues speaking, message: "(talk over Miku longer to interrupt)"
-
-### Test 3: Multiple Queued Inputs (PREVENTED)
-1. Miku is speaking
-2. Say "yeah" 5 times quickly
-3. **Expected**: All ignored except one that might interrupt
-4. **OLD BEHAVIOR**: Would queue 5 responses ❌
-5. **NEW BEHAVIOR**: Ignores them ✅
-
-### Test 4: Conversation History
-1. Start conversation
-2. Interrupt Miku mid-sentence
-3. Ask: "What were you saying?"
-4. **Expected**: Miku should acknowledge she was interrupted
-
----
-
-## User Experience
-
-### What Users See
-
-**Normal conversation:**
-```
-🎤 koko210: "Hey Miku, how are you?"
-💭 Miku is thinking...
-🎤 Miku: "I'm doing great! How about you?"
-```
-
-**Quick talk-over (ignored):**
-```
-🎤 Miku: "I'm doing great! How about..."
-💬 koko210 said: "yeah" (talk over Miku longer to interrupt)
-🎤 Miku: "...you? I hope you're having a good day!"
-```
-
-**Successful interruption:**
-```
-🎤 Miku: "I'm doing great! How about..."
-⚠️ koko210 interrupted Miku
-🎤 koko210: "Actually, can you sing something?"
-💭 Miku is thinking...
-```
-
----
-
-## Technical Details
-
-### Interruption Detection Flow
-
-```python
-# In voice_receiver.py _send_audio_chunk()
-
-if miku_speaking:
-    if user_id not in interruption_start_time:
-        # First chunk during Miku's speech
-        interruption_start_time[user_id] = current_time
-        interruption_audio_count[user_id] = 1
-    else:
-        # Increment chunk count
-        interruption_audio_count[user_id] += 1
-    
-    # Calculate duration
-    duration = current_time - interruption_start_time[user_id]
-    chunks = interruption_audio_count[user_id]
-    
-    # Check threshold
-    if duration >= 0.8 and chunks >= 8:
-        # INTERRUPT!
-        trigger_interruption(user_id)
-```
-
-### Cancellation Flow
-
-```python
-# In voice_manager.py on_user_interruption()
-
-1. Set miku_speaking = False
-   → LLM streaming loop checks this and breaks
-   
-2. Call _cancel_tts()
-   → Stops voice_client playback
-   → Sends /interrupt to RVC server
-   
-3. Add history marker
-   → {"role": "assistant", "content": "[INTERRUPTED]"}
-   
-4. Ready for next input!
-```
-
----
-
-## Performance
-
-- **Detection latency**: ~20-40ms (1-2 audio chunks)
-- **Cancellation latency**: ~50-100ms (TTS stop + buffer clear)
-- **Total response time**: ~100-150ms from speech start to Miku stopping
-- **False positive rate**: Very low with dual threshold system
-
----
-
-## Monitoring
-
-### Check Interruption Logs
-```bash
-docker logs -f miku-bot | grep "interrupted"
-```
-
-**Expected output**:
-```
-🛑 User 209381657369772032 interrupted Miku (duration=1.2s, chunks=15)
-✓ Interruption handled, ready for next input
-```
-
-### Debug Interruption Detection
-```bash
-docker logs -f miku-bot | grep "interruption"
-```
-
-### Check for Queued Responses (should be none!)
-```bash
-docker logs -f miku-bot | grep "Ignoring new input"
-```
-
----
-
-## Edge Cases Handled
-
-1. **Multiple users interrupting**: Each user tracked independently
-2. **Rapid speech then silence**: Interruption tracking resets when Miku stops
-3. **Network packet loss**: Opus decode errors don't affect tracking
-4. **Container restart**: Tracking state cleaned up properly
-5. **Miku finishes naturally**: Interruption tracking cleared
-
----
-
-## Files Modified
-
-1. **bot/utils/voice_receiver.py**
-   - Added interruption tracking dictionaries
-   - Added detection logic in `_send_audio_chunk()`
-   - Cleanup interruption state in `stop_listening()`
-   - Configurable thresholds at init
-
-2. **bot/utils/voice_manager.py**
-   - Updated `on_user_interruption()` to handle graceful cancel
-   - Added history marker for interruptions
-   - Modified `_generate_voice_response()` to not save incomplete responses
-   - Added queue prevention in `on_final_transcript()`
-   - Reduced silence timeout to 1.0s
-
----
-
-## Benefits
-
-✅ **Natural conversation flow**: No more awkward queued responses  
-✅ **Responsive**: Miku stops quickly when interrupted  
-✅ **Context-aware**: History tracks interruptions  
-✅ **False-positive resistant**: Dual threshold prevents accidental triggers  
-✅ **User-friendly**: Clear feedback about what's happening  
-✅ **Performant**: Minimal latency, efficient tracking  
-
----
-
-## Future Enhancements
-
-- [ ] **Adaptive thresholds** based on user speech patterns
-- [ ] **Volume-based detection** (interrupt faster if user speaks loudly)
-- [ ] **Context-aware responses** (Miku acknowledges interruption more naturally)
-- [ ] **User preferences** (some users may want different sensitivity)
-- [ ] **Multi-turn interruption** (handle rapid back-and-forth better)
-
----
-
-**Status**: ✅ **DEPLOYED AND READY FOR TESTING**
-
-Try interrupting Miku mid-sentence - she should stop gracefully and listen to your new input!
diff --git a/README.md b/README.md
deleted file mode 100644
index 5296d38..0000000
--- a/README.md
+++ /dev/null
@@ -1,535 +0,0 @@
-# 🎤 Miku Discord Bot 💙
-
-<div align="center">
-
-![Miku Banner](https://img.shields.io/badge/Virtual_Idol-Hatsune_Miku-00CED1?style=for-the-badge&logo=discord&logoColor=white)
-[![Docker](https://img.shields.io/badge/docker-%230db7ed.svg?style=for-the-badge&logo=docker&logoColor=white)](https://www.docker.com/)
-[![Python](https://img.shields.io/badge/python-3.10+-3776AB?style=for-the-badge&logo=python&logoColor=white)](https://www.python.org/)
-[![Discord.py](https://img.shields.io/badge/discord.py-2.0+-5865F2?style=for-the-badge&logo=discord&logoColor=white)](https://discordpy.readthedocs.io/)
-
-*The world's #1 Virtual Idol, now in your Discord server! 🌱✨*
-
-[Features](#-features) • [Quick Start](#-quick-start) • [Architecture](#️-architecture) • [API](#-api-endpoints) • [Contributing](#-contributing)
-
-</div>
-
----
-
-## 🌟 About
-
-Meet **Hatsune Miku** - a fully-featured, AI-powered Discord bot that brings the cheerful, energetic virtual idol to life in your server! Powered by local LLMs (Llama 3.1), vision models (MiniCPM-V), and a sophisticated autonomous behavior system, Miku can chat, react, share content, and even change her profile picture based on her mood!
-
-### Why This Bot?
-
-- 🎭 **14 Dynamic Moods** - From bubbly to melancholy, Miku's personality adapts
-- 🤖 **Smart Autonomous Behavior** - Context-aware decisions without spamming
-- 👁️ **Vision Capabilities** - Analyzes images, videos, and GIFs in conversations
-- 🎨 **Auto Profile Pictures** - Fetches & crops anime faces from Danbooru based on mood
-- 💬 **DM Support** - Personal conversations with mood tracking
-- 🐦 **Twitter Integration** - Shares Miku-related tweets and figurine announcements
-- 🎮 **ComfyUI Integration** - Natural language image generation requests
-- 🔊 **Voice Chat Ready** - Fish.audio TTS integration (docs included)
-- 📊 **RESTful API** - Full control via HTTP endpoints
-- 🐳 **Production Ready** - Docker Compose with GPU support
-
----
-
-## ✨ Features
-
-### 🧠 AI & LLM Integration
-
-- **Local LLM Processing** with Llama 3.1 8B (via llama.cpp + llama-swap)
-- **Automatic Model Switching** - Text ↔️ Vision models swap on-demand
-- **OpenAI-Compatible API** - Easy migration and integration
-- **Conversation History** - Per-user context with RAG-style retrieval
-- **Smart Prompting** - Mood-aware system prompts with personality profiles
-
-### 🎭 Mood & Personality System
-
-<details>
-<summary>14 Available Moods (click to expand)</summary>
-
-- 😊 **Neutral** - Classic cheerful Miku
-- 😴 **Asleep** - Sleepy and minimally responsive
-- 😪 **Sleepy** - Getting tired, simple responses
-- 🎉 **Excited** - Extra energetic and enthusiastic
-- 💫 **Bubbly** - Playful and giggly
-- 🤔 **Curious** - Inquisitive and wondering
-- 😳 **Shy** - Blushing and hesitant
-- 🤪 **Silly** - Goofy and fun-loving
-- 😠 **Angry** - Frustrated or upset
-- 😤 **Irritated** - Mildly annoyed
-- 😢 **Melancholy** - Sad and reflective
-- 😏 **Flirty** - Playful and teasing
-- 💕 **Romantic** - Sweet and affectionate
-- 🎯 **Serious** - Focused and thoughtful
-
-</details>
-
-- **Per-Server Mood Tracking** - Different moods in different servers
-- **DM Mood Persistence** - Separate mood state for private conversations
-- **Automatic Mood Shifts** - Responds to conversation sentiment
-
-### 🤖 Autonomous Behavior System V2
-
-The bot features a sophisticated **context-aware decision engine** that makes Miku feel alive:
-
-- **Intelligent Activity Detection** - Tracks message frequency, user presence, and channel activity
-- **Non-Intrusive** - Won't spam or interrupt important conversations
-- **Mood-Based Personality** - Behavioral patterns change with mood
-- **Multiple Action Types**:
-  - 💬 General conversation starters
-  - 👋 Engaging specific users
-  - 🐦 Sharing Miku tweets
-  - 💬 Joining ongoing conversations
-  - 🎨 Changing profile pictures
-  - 😊 Reacting to messages
-
-**Rate Limiting**: Minimum 30-second cooldown between autonomous actions to prevent spam.
-
-### 👁️ Vision & Media Processing
-
-- **Image Analysis** - Describe images shared in chat using MiniCPM-V 4.5
-- **Video Understanding** - Extracts frames and analyzes video content
-- **GIF Support** - Processes animated GIFs (converts to MP4 if needed)
-- **Embed Content Extraction** - Reads Twitter/X embeds without API
-- **Face Detection** - On-demand anime face detection service (GPU-accelerated)
-
-### 🎨 Dynamic Profile Picture System
-
-- **Danbooru Integration** - Searches for Miku artwork
-- **Smart Cropping** - Automatic face detection and 1:1 crop
-- **Mood-Based Selection** - Filters by tags matching current mood
-- **Quality Filtering** - Only uses high-quality, safe-rated images
-- **Fallback System** - Graceful degradation if detection fails
-
-### 🐦 Twitter Features
-
-- **Tweet Sharing** - Automatically fetches and shares Miku-related tweets
-- **Figurine Notifications** - DM subscribers about new Miku figurine releases
-- **Embed Compatibility** - Uses fxtwitter for better Discord previews
-- **Duplicate Prevention** - Tracks sent tweets to avoid repeats
-
-### 🎮 ComfyUI Image Generation
-
-- **Natural Language Detection** - "Draw me as Miku swimming in a pool"
-- **Workflow Integration** - Connects to external ComfyUI instance
-- **Smart Prompting** - Enhances user requests with context
-
-### 📡 REST API Dashboard
-
-Full-featured FastAPI server with endpoints for:
-- Mood management (get/set/reset)
-- Conversation history
-- Autonomous actions (trigger manually)
-- Profile picture updates
-- Server configuration
-- DM analysis reports
-
-### 🔧 Developer Features
-
-- **Docker Compose Setup** - One command deployment
-- **GPU Acceleration** - NVIDIA runtime for models and face detection
-- **Health Checks** - Automatic service monitoring
-- **Volume Persistence** - Conversation history and settings saved
-- **Hot Reload** - Update without restarting (for development)
-
----
-
-## 🚀 Quick Start
-
-### Prerequisites
-
-- **Docker** & **Docker Compose** installed
-- **NVIDIA GPU** with CUDA support (for model inference)
-- **Discord Bot Token** ([Create one here](https://discord.com/developers/applications))
-- At least **8GB VRAM** recommended (4GB minimum)
-
-### Installation
-
-1. **Clone the repository**
-   ```bash
-   git clone https://github.com/yourusername/miku-discord.git
-   cd miku-discord
-   ```
-
-2. **Set up your bot token**
-   
-   Edit `docker-compose.yml` and replace the `DISCORD_BOT_TOKEN`:
-   ```yaml
-   environment:
-     - DISCORD_BOT_TOKEN=your_token_here
-     - OWNER_USER_ID=your_discord_user_id  # For DM reports
-   ```
-
-3. **Add your models**
-   
-   Place these GGUF models in the `models/` directory:
-   - `Llama-3.1-8B-Instruct-UD-Q4_K_XL.gguf` (text model)
-   - `MiniCPM-V-4_5-Q3_K_S.gguf` (vision model)
-   - `MiniCPM-V-4_5-mmproj-f16.gguf` (vision projector)
-
-4. **Launch the bot**
-   ```bash
-   docker-compose up -d
-   ```
-
-5. **Check logs**
-   ```bash
-   docker-compose logs -f miku-bot
-   ```
-
-6. **Access the dashboard**
-   
-   Open http://localhost:3939 in your browser
-
-### Optional: ComfyUI Integration
-
-If you have ComfyUI running, update the path in `docker-compose.yml`:
-```yaml
-volumes:
-  - /path/to/your/ComfyUI/output:/app/ComfyUI/output:ro
-```
-
-### Optional: Face Detection Service
-
-Start the anime face detector when needed:
-```bash
-docker-compose --profile tools up -d anime-face-detector
-```
-
-Access Gradio UI at http://localhost:7860
-
----
-
-## 🏗️ Architecture
-
-### Service Overview
-
-```
-┌─────────────────────────────────────────────────────────────┐
-│                        Discord API                          │
-└───────────────────────┬─────────────────────────────────────┘
-                        │
-                        ▼
-┌─────────────────────────────────────────────────────────────┐
-│                     Miku Bot (Python)                       │
-│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐     │
-│  │   Discord    │  │   FastAPI    │  │  Autonomous  │     │
-│  │  Event Loop  │  │   Server     │  │    Engine    │     │
-│  └──────────────┘  └──────────────┘  └──────────────┘     │
-└───────────┬────────────────┬────────────────┬──────────────┘
-            │                │                │
-            ▼                ▼                ▼
-┌─────────────────┐ ┌─────────────────┐ ┌──────────────┐
-│   llama-swap    │ │   ComfyUI       │ │ Face Detector│
-│  (Model Server) │ │ (Image Gen)     │ │  (On-Demand) │
-│                 │ │                 │ │              │
-│  • Llama 3.1    │ │  • Workflows    │ │  • Gradio UI │
-│  • MiniCPM-V    │ │  • GPU Accel    │ │  • FastAPI   │
-│  • Auto-swap    │ │                 │ │              │
-└─────────────────┘ └─────────────────┘ └──────────────┘
-         │
-         ▼
-   ┌──────────┐
-   │  Models  │
-   │  (GGUF)  │
-   └──────────┘
-```
-
-### Tech Stack
-
-| Component | Technology |
-|-----------|-----------|
-| **Bot Framework** | Discord.py 2.0+ |
-| **LLM Backend** | llama.cpp + llama-swap |
-| **Text Model** | Llama 3.1 8B Instruct |
-| **Vision Model** | MiniCPM-V 4.5 |
-| **API Server** | FastAPI + Uvicorn |
-| **Image Gen** | ComfyUI (external) |
-| **Face Detection** | Anime-Face-Detector (Gradio) |
-| **Database** | JSON files (conversation history, settings) |
-| **Containerization** | Docker + Docker Compose |
-| **GPU Runtime** | NVIDIA Container Toolkit |
-
-### Key Components
-
-#### 1. **llama-swap** (Model Server)
-- Automatically loads/unloads models based on requests
-- Prevents VRAM exhaustion by swapping between text and vision models
-- OpenAI-compatible `/v1/chat/completions` endpoint
-- Configurable TTL (time-to-live) per model
-
-#### 2. **Autonomous Engine V2**
-- Tracks message activity, user presence, and channel engagement
-- Calculates "engagement scores" per server
-- Makes context-aware decisions without LLM overhead
-- Personality profiles per mood (e.g., shy mood = less engaging)
-
-#### 3. **Server Manager**
-- Per-guild configuration (mood, sleep state, autonomous settings)
-- Scheduled tasks (bedtime reminders, autonomous ticks)
-- Persistent storage in `servers_config.json`
-
-#### 4. **Conversation History**
-- Vector-based RAG (Retrieval Augmented Generation)
-- Stores last 50 messages per user
-- Semantic search using FAISS
-- Context injection for continuity
-
----
-
-## 📡 API Endpoints
-
-The bot runs a FastAPI server on port **3939** with the following endpoints:
-
-### Mood Management
-
-| Endpoint | Method | Description |
-|----------|--------|-------------|
-| `/servers/{guild_id}/mood` | GET | Get current mood for server |
-| `/servers/{guild_id}/mood` | POST | Set mood (body: `{"mood": "excited"}`) |
-| `/servers/{guild_id}/mood/reset` | POST | Reset to neutral mood |
-| `/mood` | GET | Get DM mood (deprecated, use server-specific) |
-
-### Autonomous Actions
-
-| Endpoint | Method | Description |
-|----------|--------|-------------|
-| `/autonomous/general` | POST | Make Miku say something random |
-| `/autonomous/engage` | POST | Engage a random user |
-| `/autonomous/tweet` | POST | Share a Miku tweet |
-| `/autonomous/reaction` | POST | React to a recent message |
-| `/autonomous/custom` | POST | Custom prompt (body: `{"prompt": "..."}`) |
-
-### Profile Pictures
-
-| Endpoint | Method | Description |
-|----------|--------|-------------|
-| `/profile-picture/change` | POST | Change profile picture (body: `{"mood": "happy"}`) |
-| `/profile-picture/revert` | POST | Revert to previous picture |
-| `/profile-picture/current` | GET | Get current picture metadata |
-
-### Utilities
-
-| Endpoint | Method | Description |
-|----------|--------|-------------|
-| `/conversation/reset` | POST | Clear conversation history for user |
-| `/logs` | GET | View bot logs (last 1000 lines) |
-| `/prompt` | GET | View current system prompt |
-| `/` | GET | Dashboard HTML page |
-
-### Example Usage
-
-```bash
-# Set mood to excited
-curl -X POST http://localhost:3939/servers/123456789/mood \
-  -H "Content-Type: application/json" \
-  -d '{"mood": "excited"}'
-
-# Make Miku say something
-curl -X POST http://localhost:3939/autonomous/general
-
-# Change profile picture
-curl -X POST http://localhost:3939/profile-picture/change \
-  -H "Content-Type: application/json" \
-  -d '{"mood": "flirty"}'
-```
-
----
-
-## 🎮 Usage Examples
-
-### Basic Interaction
-
-```
-User: Hey Miku! How are you today?
-Miku: Miku's doing great! 💙 Thanks for asking! ✨
-
-User: Can you see this? [uploads image]
-Miku: Ooh! 👀 I see a cute cat sitting on a keyboard! So fluffy! 🐱
-```
-
-### Mood Changes
-
-```
-User: /mood excited
-Miku: YAYYY!!! 🎉✨ Miku is SO EXCITED right now!!! Let's have fun! 💙🎶
-
-User: What's your favorite food?
-Miku: NEGI!! 🌱🌱🌱 Green onions are THE BEST! Want some?! ✨
-```
-
-### Image Generation
-
-```
-User: Draw yourself swimming in a pool
-Miku: Ooh! Let me create that for you! 🎨✨ [generates image]
-```
-
-### Autonomous Behavior
-
-```
-[After detecting activity in #general]
-Miku: Hey everyone! 👋 What are you all talking about? 💙
-```
-
----
-
-## 🛠️ Configuration
-
-### Model Configuration (`llama-swap-config.yaml`)
-
-```yaml
-models:
-  llama3.1:
-    cmd: /app/llama-server --port ${PORT} --model /models/Llama-3.1-8B-Instruct-UD-Q4_K_XL.gguf -ngl 99
-    ttl: 1800  # 30 minutes
-    
-  vision:
-    cmd: /app/llama-server --port ${PORT} --model /models/MiniCPM-V-4_5-Q3_K_S.gguf --mmproj /models/MiniCPM-V-4_5-mmproj-f16.gguf
-    ttl: 900   # 15 minutes
-```
-
-### Environment Variables
-
-| Variable | Default | Description |
-|----------|---------|-------------|
-| `DISCORD_BOT_TOKEN` | *Required* | Your Discord bot token |
-| `OWNER_USER_ID` | *Optional* | Your Discord user ID (for DM reports) |
-| `LLAMA_URL` | `http://llama-swap:8080` | Model server endpoint |
-| `TEXT_MODEL` | `llama3.1` | Text generation model name |
-| `VISION_MODEL` | `vision` | Vision model name |
-
-### Persistent Storage
-
-All data is stored in `bot/memory/`:
-- `servers_config.json` - Per-server settings
-- `autonomous_config.json` - Autonomous behavior settings
-- `conversation_history/` - User conversation data
-- `profile_pictures/` - Downloaded profile pictures
-- `dms/` - DM conversation logs
-- `figurine_subscribers.json` - Figurine notification subscribers
-
----
-
-## 📚 Documentation
-
-Detailed documentation available in the `readmes/` directory:
-
-- **[AUTONOMOUS_V2_IMPLEMENTED.md](readmes/AUTONOMOUS_V2_IMPLEMENTED.md)** - Autonomous system V2 details
-- **[VOICE_CHAT_IMPLEMENTATION.md](readmes/VOICE_CHAT_IMPLEMENTATION.md)** - Fish.audio TTS integration guide
-- **[PROFILE_PICTURE_FEATURE.md](readmes/PROFILE_PICTURE_FEATURE.md)** - Profile picture system
-- **[FACE_DETECTION_API_MIGRATION.md](readmes/FACE_DETECTION_API_MIGRATION.md)** - Face detection setup
-- **[DM_ANALYSIS_FEATURE.md](readmes/DM_ANALYSIS_FEATURE.md)** - DM interaction analytics
-- **[MOOD_SYSTEM_ANALYSIS.md](readmes/MOOD_SYSTEM_ANALYSIS.md)** - Mood system deep dive
-- **[QUICK_REFERENCE.md](readmes/QUICK_REFERENCE.md)** - llama.cpp setup and migration guide
-
----
-
-## 🐛 Troubleshooting
-
-### Bot won't start
-
-**Check if models are loaded:**
-```bash
-docker-compose logs llama-swap
-```
-
-**Verify GPU access:**
-```bash
-docker run --rm --runtime=nvidia nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
-```
-
-### High VRAM usage
-
-- Lower the `-ngl` parameter in `llama-swap-config.yaml` (reduce GPU layers)
-- Reduce context size with `-c` parameter
-- Use smaller quantization (Q3 instead of Q4)
-
-### Autonomous actions not triggering
-
-- Check `autonomous_config.json` - ensure enabled and cooldown settings
-- Verify activity in server (bot tracks engagement)
-- Check logs for decision engine output
-
-### Face detection not working
-
-- Ensure GPU is available: `docker-compose --profile tools up -d anime-face-detector`
-- Check API health: `curl http://localhost:6078/health`
-- View Gradio UI: http://localhost:7860
-
-### Models switching too frequently
-
-Increase TTL in `llama-swap-config.yaml`:
-```yaml
-ttl: 3600  # 1 hour instead of 30 minutes
-```
-
-
-### Development Setup
-
-For local development without Docker:
-
-```bash
-# Install dependencies
-cd bot
-pip install -r requirements.txt
-
-# Set environment variables
-export DISCORD_BOT_TOKEN="your_token"
-export LLAMA_URL="http://localhost:8080"
-
-# Run the bot
-python bot.py
-```
-
-### Code Style
-
-- Use type hints where possible
-- Follow PEP 8 conventions
-- Add docstrings to functions
-- Comment complex logic
-
----
-
-## 📝 License
-
-This project is provided as-is for educational and personal use. Please respect:
-- Discord's [Terms of Service](https://discord.com/terms)
-- Crypton Future Media's [Piapro Character License](https://piapro.net/intl/en_for_creators.html)
-- Model licenses (Llama 3.1, MiniCPM-V)
-
----
-
-## 🙏 Acknowledgments
-
-- **Crypton Future Media** - For creating Hatsune Miku
-- **llama.cpp** - For efficient local LLM inference
-- **mostlygeek/llama-swap** - For brilliant model management
-- **Discord.py** - For the excellent Discord API wrapper
-- **OpenAI** - For the API standard
-- **MiniCPM-V Team** - For the amazing vision model
-- **Danbooru** - For the artwork API
-
----
-
-## 💙 Support
-
-If you enjoy this project:
-- ⭐ Star this repository
-- 🐛 Report bugs via [Issues](https://github.com/yourusername/miku-discord/issues)
-- 💬 Share your Miku bot setup in [Discussions](https://github.com/yourusername/miku-discord/discussions)
-- 🎤 Listen to some Miku songs!
-
----
-
-<div align="center">
-
-**Made with 💙 by a Miku fan, for Miku fans**
-
-*"The future begins now!" - Hatsune Miku* 🎶✨
-
-[⬆ Back to Top](#-miku-discord-bot-)
-
-</div>
diff --git a/SILENCE_DETECTION.md b/SILENCE_DETECTION.md
deleted file mode 100644
index 74b391d..0000000
--- a/SILENCE_DETECTION.md
+++ /dev/null
@@ -1,222 +0,0 @@
-# Silence Detection Implementation
-
-## What Was Added
-
-Implemented automatic silence detection to trigger final transcriptions in the new ONNX-based STT system.
-
-### Problem
-The new ONNX server requires manually sending a `{"type": "final"}` command to get the complete transcription. Without this, partial transcripts would appear but never be finalized and sent to LlamaCPP.
-
-### Solution
-Added silence tracking in `voice_receiver.py`:
-
-1. **Track audio timestamps**: Record when the last audio chunk was sent
-2. **Detect silence**: Start a timer after each audio chunk  
-3. **Send final command**: If no new audio arrives within 1.5 seconds, send `{"type": "final"}`
-4. **Cancel on new audio**: Reset the timer if more audio arrives
-
----
-
-## Implementation Details
-
-### New Attributes
-```python
-self.last_audio_time: Dict[int, float] = {}      # Track last audio per user
-self.silence_tasks: Dict[int, asyncio.Task] = {} # Silence detection tasks
-self.silence_timeout = 1.5  # Seconds of silence before "final"
-```
-
-### New Method
-```python
-async def _detect_silence(self, user_id: int):
-    """
-    Wait for silence timeout and send 'final' command to STT.
-    Called after each audio chunk.
-    """
-    await asyncio.sleep(self.silence_timeout)
-    stt_client = self.stt_clients.get(user_id)
-    if stt_client and stt_client.is_connected():
-        await stt_client.send_final()
-```
-
-### Integration
-- Called after sending each audio chunk
-- Cancels previous silence task if new audio arrives
-- Automatically cleaned up when stopping listening
-
----
-
-## Testing
-
-### Test 1: Basic Transcription
-1. Join voice channel
-2. Run `!miku listen`
-3. **Speak a sentence** and wait 1.5 seconds
-4. **Expected**: Final transcript appears and is sent to LlamaCPP
-
-### Test 2: Continuous Speech
-1. Start listening
-2. **Speak multiple sentences** with pauses < 1.5s between them
-3. **Expected**: Partial transcripts update, final sent after last sentence
-
-### Test 3: Multiple Users
-1. Have 2+ users in voice channel
-2. Each runs `!miku listen`
-3. Both speak (taking turns or simultaneously)
-4. **Expected**: Each user's speech is transcribed independently
-
----
-
-## Configuration
-
-### Silence Timeout
-Default: `1.5` seconds
-
-**To adjust**, edit `voice_receiver.py`:
-```python
-self.silence_timeout = 1.5  # Change this value
-```
-
-**Recommendations**:
-- **Too short (< 1.0s)**: May cut off during natural pauses in speech
-- **Too long (> 3.0s)**: User waits too long for response
-- **Sweet spot**: 1.5-2.0s works well for conversational speech
-
----
-
-## Monitoring
-
-### Check Logs for Silence Detection
-```bash
-docker logs miku-bot 2>&1 | grep "Silence detected"
-```
-
-**Expected output**:
-```
-[DEBUG] Silence detected for user 209381657369772032, requesting final transcript
-```
-
-### Check Final Transcripts
-```bash
-docker logs miku-bot 2>&1 | grep "FINAL TRANSCRIPT"
-```
-
-### Check STT Processing
-```bash
-docker logs miku-stt 2>&1 | grep "Final transcription"
-```
-
----
-
-## Debugging
-
-### Issue: No Final Transcript
-**Symptoms**: Partial transcripts appear but never finalize
-
-**Debug steps**:
-1. Check if silence detection is triggering:
-   ```bash
-   docker logs miku-bot 2>&1 | grep "Silence detected"
-   ```
-
-2. Check if final command is being sent:
-   ```bash
-   docker logs miku-stt 2>&1 | grep "type.*final"
-   ```
-
-3. Increase log level in stt_client.py:
-   ```python
-   logger.setLevel(logging.DEBUG)
-   ```
-
-### Issue: Cuts Off Mid-Sentence
-**Symptoms**: Final transcript triggers during natural pauses
-
-**Solution**: Increase silence timeout:
-```python
-self.silence_timeout = 2.0  # or 2.5
-```
-
-### Issue: Too Slow to Respond
-**Symptoms**: Long wait after user stops speaking
-
-**Solution**: Decrease silence timeout:
-```python
-self.silence_timeout = 1.0  # or 1.2
-```
-
----
-
-## Architecture
-
-```
-Discord Voice → voice_receiver.py
-                     ↓
-            [Audio Chunk Received]
-                     ↓
-         ┌─────────────────────┐
-         │  send_audio()       │
-         │  to STT server      │
-         └─────────────────────┘
-                     ↓
-         ┌─────────────────────┐
-         │  Start silence      │
-         │  detection timer    │
-         │  (1.5s countdown)   │
-         └─────────────────────┘
-                     ↓
-              ┌──────┴──────┐
-              │             │
-        More audio    No more audio
-        arrives       for 1.5s
-              │             │
-              ↓             ↓
-         Cancel timer  ┌──────────────┐
-         Start new     │ send_final() │
-                       │ to STT       │
-                       └──────────────┘
-                             ↓
-                    ┌─────────────────┐
-                    │ Final transcript│
-                    │ → LlamaCPP     │
-                    └─────────────────┘
-```
-
----
-
-## Files Modified
-
-1. **bot/utils/voice_receiver.py**
-   - Added `last_audio_time` tracking
-   - Added `silence_tasks` management
-   - Added `_detect_silence()` method
-   - Integrated silence detection in `_send_audio_chunk()`
-   - Added cleanup in `stop_listening()`
-
-2. **bot/utils/stt_client.py** (previously)
-   - Added `send_final()` method
-   - Added `send_reset()` method
-   - Updated protocol handler
-
----
-
-## Next Steps
-
-1. **Test thoroughly** with different speech patterns
-2. **Tune silence timeout** based on user feedback
-3. **Consider VAD integration** for more accurate speech end detection
-4. **Add metrics** to track transcription latency
-
----
-
-**Status**: ✅ **READY FOR TESTING**
-
-The system now:
-- ✅ Connects to ONNX STT server (port 8766)
-- ✅ Uses CUDA GPU acceleration (cuDNN 9)
-- ✅ Receives partial transcripts
-- ✅ Automatically detects silence
-- ✅ Sends final command after 1.5s silence
-- ✅ Forwards final transcript to LlamaCPP
-
-**Test it now with `!miku listen`!**
diff --git a/STT_DEBUG_SUMMARY.md b/STT_DEBUG_SUMMARY.md
deleted file mode 100644
index 88e40d4..0000000
--- a/STT_DEBUG_SUMMARY.md
+++ /dev/null
@@ -1,207 +0,0 @@
-# STT Debug Summary - January 18, 2026
-
-## Issues Identified & Fixed ✅
-
-### 1. **CUDA Not Being Used** ❌ → ✅
-**Problem:** Container was falling back to CPU, causing slow transcription.
-
-**Root Cause:** 
-```
-libcudnn.so.9: cannot open shared object file: No such file or directory
-```
-The ONNX Runtime requires cuDNN 9, but the base image `nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04` only had cuDNN 8.
-
-**Fix Applied:**
-```dockerfile
-# Changed from:
-FROM nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04
-
-# To:
-FROM nvidia/cuda:12.6.2-cudnn-runtime-ubuntu22.04
-```
-
-**Verification:**
-```bash
-$ docker logs miku-stt 2>&1 | grep "Providers"
-INFO:asr.asr_pipeline:Providers: [('CUDAExecutionProvider', {'device_id': 0, ...}), 'CPUExecutionProvider']
-```
-✅ CUDAExecutionProvider is now loaded successfully!
-
----
-
-### 2. **Connection Refused Error** ❌ → ✅
-**Problem:** Bot couldn't connect to STT service.
-
-**Error:**
-```
-ConnectionRefusedError: [Errno 111] Connect call failed ('172.20.0.5', 8000)
-```
-
-**Root Cause:** Port mismatch between bot and STT server.
-- Bot was connecting to: `ws://miku-stt:8000`
-- STT server was running on: `ws://miku-stt:8766`
-
-**Fix Applied:**
-Updated `bot/utils/stt_client.py`:
-```python
-def __init__(
-    self,
-    user_id: str,
-    stt_url: str = "ws://miku-stt:8766/ws/stt",  # ← Changed from 8000
-    ...
-)
-```
-
----
-
-### 3. **Protocol Mismatch** ❌ → ✅
-**Problem:** Bot and STT server were using incompatible protocols.
-
-**Old NeMo Protocol:**
-- Automatic VAD detection
-- Events: `vad`, `partial`, `final`, `interruption`
-- No manual control needed
-
-**New ONNX Protocol:**
-- Manual transcription control
-- Events: `transcript` (with `is_final` flag), `info`, `error`
-- Requires sending `{"type": "final"}` command to get final transcript
-
-**Fix Applied:**
-
-1. **Updated event handler** in `stt_client.py`:
-```python
-async def _handle_event(self, event: dict):
-    event_type = event.get('type')
-    
-    if event_type == 'transcript':
-        # New ONNX protocol
-        text = event.get('text', '')
-        is_final = event.get('is_final', False)
-        
-        if is_final:
-            if self.on_final_transcript:
-                await self.on_final_transcript(text, timestamp)
-        else:
-            if self.on_partial_transcript:
-                await self.on_partial_transcript(text, timestamp)
-    
-    # Also maintains backward compatibility with old protocol
-    elif event_type == 'partial' or event_type == 'final':
-        # Legacy support...
-```
-
-2. **Added new methods** for manual control:
-```python
-async def send_final(self):
-    """Request final transcription from STT server."""
-    command = json.dumps({"type": "final"})
-    await self.websocket.send_str(command)
-
-async def send_reset(self):
-    """Reset the STT server's audio buffer."""
-    command = json.dumps({"type": "reset"})
-    await self.websocket.send_str(command)
-```
-
----
-
-## Current Status
-
-### Containers
-- ✅ `miku-stt`: Running with CUDA 12.6.2 + cuDNN 9
-- ✅ `miku-bot`: Rebuilt with updated STT client
-- ✅ Both containers healthy and communicating on correct port
-
-### STT Container Logs
-```
-CUDA Version 12.6.2
-INFO:asr.asr_pipeline:Providers: [('CUDAExecutionProvider', ...)]
-INFO:asr.asr_pipeline:Model loaded successfully
-INFO:__main__:Server running on ws://0.0.0.0:8766
-INFO:__main__:Active connections: 0
-```
-
-### Files Modified
-1. `stt-parakeet/Dockerfile` - Updated base image to CUDA 12.6.2
-2. `bot/utils/stt_client.py` - Fixed port, protocol, added new methods
-3. `docker-compose.yml` - Already updated to use new STT service
-4. `STT_MIGRATION.md` - Added troubleshooting section
-
----
-
-## Testing Checklist
-
-### Ready to Test ✅
-- [x] CUDA GPU acceleration enabled
-- [x] Port configuration fixed
-- [x] Protocol compatibility updated
-- [x] Containers rebuilt and running
-
-### Next Steps for User 🧪
-1. **Test voice commands**: Use `!miku listen` in Discord
-2. **Verify transcription**: Check if audio is transcribed correctly
-3. **Monitor performance**: Check transcription speed and quality
-4. **Check logs**: Monitor `docker logs miku-bot` and `docker logs miku-stt` for errors
-
-### Expected Behavior
-- Bot connects to STT server successfully
-- Audio is streamed to STT server
-- Progressive transcripts appear (optional, may need VAD integration)
-- Final transcript is returned when user stops speaking
-- No more CUDA/cuDNN errors
-- No more connection refused errors
-
----
-
-## Technical Notes
-
-### GPU Utilization
-- **Before:** CPU fallback (0% GPU usage)
-- **After:** CUDA acceleration (~85-95% GPU usage on GTX 1660)
-
-### Performance Expectations
-- **Transcription Speed:** ~0.5-1 second per utterance (down from 2-3 seconds)
-- **VRAM Usage:** ~2-3GB (down from 4-5GB with NeMo)
-- **Model:** Parakeet TDT 0.6B (ONNX optimized)
-
-### Known Limitations
-- No word-level timestamps (ONNX model doesn't provide them)
-- Progressive transcription requires sending audio chunks regularly
-- Must call `send_final()` to get final transcript (not automatic)
-
----
-
-## Additional Information
-
-### Container Network
-- Network: `miku-discord_default`
-- STT Service: `miku-stt:8766`
-- Bot Service: `miku-bot`
-
-### Health Check
-```bash
-# Check STT container health
-docker inspect miku-stt | grep -A5 Health
-
-# Test WebSocket connection
-curl -i -N -H "Connection: Upgrade" -H "Upgrade: websocket" \
-  -H "Sec-WebSocket-Version: 13" -H "Sec-WebSocket-Key: test" \
-  http://localhost:8766/
-```
-
-### Logs Monitoring
-```bash
-# Follow both containers
-docker-compose logs -f miku-bot miku-stt
-
-# Just STT
-docker logs -f miku-stt
-
-# Search for errors
-docker logs miku-bot 2>&1 | grep -i "error\|failed\|exception"
-```
-
----
-
-**Migration Status:** ✅ **COMPLETE - READY FOR TESTING**
diff --git a/STT_FIX_COMPLETE.md b/STT_FIX_COMPLETE.md
deleted file mode 100644
index a6605bd..0000000
--- a/STT_FIX_COMPLETE.md
+++ /dev/null
@@ -1,192 +0,0 @@
-# STT Fix Applied - Ready for Testing
-
-## Summary
-
-Fixed all three issues preventing the ONNX-based Parakeet STT from working:
-
-1. ✅ **CUDA Support**: Updated Docker base image to include cuDNN 9
-2. ✅ **Port Configuration**: Fixed bot to connect to port 8766 (found TWO places)
-3. ✅ **Protocol Compatibility**: Updated event handler for new ONNX format
-
----
-
-## Files Modified
-
-### 1. `stt-parakeet/Dockerfile`
-```diff
-- FROM nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04
-+ FROM nvidia/cuda:12.6.2-cudnn-runtime-ubuntu22.04
-```
-
-### 2. `bot/utils/stt_client.py`
-```diff
-- stt_url: str = "ws://miku-stt:8000/ws/stt"
-+ stt_url: str = "ws://miku-stt:8766/ws/stt"
-```
-
-Added new methods:
-- `send_final()` - Request final transcription
-- `send_reset()` - Clear audio buffer
-
-Updated `_handle_event()` to support:
-- New ONNX protocol: `{"type": "transcript", "is_final": true/false}`
-- Legacy protocol: `{"type": "partial"}`, `{"type": "final"}` (backward compatibility)
-
-### 3. `bot/utils/voice_receiver.py` ⚠️ **KEY FIX**
-```diff
-- def __init__(self, voice_manager, stt_url: str = "ws://miku-stt:8000/ws/stt"):
-+ def __init__(self, voice_manager, stt_url: str = "ws://miku-stt:8766/ws/stt"):
-```
-
-**This was the missing piece!** The `voice_receiver` was overriding the default URL.
-
----
-
-## Container Status
-
-### STT Container ✅
-```bash
-$ docker logs miku-stt 2>&1 | tail -10
-```
-```
-CUDA Version 12.6.2
-INFO:asr.asr_pipeline:Providers: [('CUDAExecutionProvider', ...)]
-INFO:asr.asr_pipeline:Model loaded successfully
-INFO:__main__:Server running on ws://0.0.0.0:8766
-INFO:__main__:Active connections: 0
-```
-
-**Status**: ✅ Running with CUDA acceleration
-
-### Bot Container ✅
-- Files copied directly into running container (faster than rebuild)
-- Python bytecode cache cleared
-- Container restarted
-
----
-
-## Testing Instructions
-
-### Test 1: Basic Connection
-1. Join a voice channel in Discord
-2. Run `!miku listen`
-3. **Expected**: Bot connects without "Connection Refused" error
-4. **Check logs**: `docker logs miku-bot 2>&1 | grep "STT"`
-
-### Test 2: Transcription
-1. After running `!miku listen`, speak into your microphone
-2. **Expected**: Your speech is transcribed
-3. **Check STT logs**: `docker logs miku-stt 2>&1 | tail -20`
-4. **Check bot logs**: Look for "Partial transcript" or "Final transcript" messages
-
-### Test 3: Performance
-1. Monitor GPU usage: `nvidia-smi -l 1`
-2. **Expected**: GPU utilization increases when transcribing
-3. **Expected**: Transcription completes in ~0.5-1 second
-
----
-
-## Monitoring Commands
-
-### Check Both Containers
-```bash
-docker logs -f --tail=50 miku-bot miku-stt
-```
-
-### Check STT Service Health
-```bash
-docker ps | grep miku-stt
-docker logs miku-stt 2>&1 | grep "CUDA\|Providers\|Server running"
-```
-
-### Check for Errors
-```bash
-# Bot errors
-docker logs miku-bot 2>&1 | grep -i "error\|failed" | tail -20
-
-# STT errors
-docker logs miku-stt 2>&1 | grep -i "error\|failed" | tail -20
-```
-
-### Test WebSocket Connection
-```bash
-# From host machine
-curl -i -N \
-  -H "Connection: Upgrade" \
-  -H "Upgrade: websocket" \
-  -H "Sec-WebSocket-Version: 13" \
-  -H "Sec-WebSocket-Key: test" \
-  http://localhost:8766/
-```
-
----
-
-## Known Issues & Workarounds
-
-### Issue: Bot Still Shows Old Errors
-**Symptom**: After restart, logs still show port 8000 errors
-
-**Cause**: Python module caching or log entries from before restart
-
-**Solution**: 
-```bash
-# Clear cache and restart
-docker exec miku-bot find /app -name "*.pyc" -delete
-docker restart miku-bot
-
-# Wait 10 seconds for full restart
-sleep 10
-```
-
-### Issue: Container Rebuild Takes 15+ Minutes
-**Cause**: `playwright install` downloads chromium/firefox browsers (~500MB)
-
-**Workaround**: Instead of full rebuild, use `docker cp`:
-```bash
-docker cp bot/utils/stt_client.py miku-bot:/app/utils/stt_client.py
-docker cp bot/utils/voice_receiver.py miku-bot:/app/utils/voice_receiver.py
-docker restart miku-bot
-```
-
----
-
-## Next Steps
-
-### For Full Deployment (after testing)
-1. Rebuild bot container properly:
-   ```bash
-   docker-compose build miku-bot
-   docker-compose up -d miku-bot
-   ```
-
-2. Remove old STT directory:
-   ```bash
-   mv stt stt.backup
-   ```
-
-3. Update documentation to reflect new architecture
-
-### Optional Enhancements
-1. Add `send_final()` call when user stops speaking (VAD integration)
-2. Implement progressive transcription display
-3. Add transcription quality metrics/logging
-4. Test with multiple simultaneous users
-
----
-
-## Quick Reference
-
-| Component | Old (NeMo) | New (ONNX) |
-|-----------|------------|------------|
-| **Port** | 8000 | 8766 |
-| **VRAM** | 4-5GB | 2-3GB |
-| **Speed** | 2-3s | 0.5-1s |
-| **cuDNN** | 8 | 9 |
-| **CUDA** | 12.1 | 12.6.2 |
-| **Protocol** | Auto VAD | Manual control |
-
----
-
-**Status**: ✅ **ALL FIXES APPLIED - READY FOR USER TESTING**
-
-Last Updated: January 18, 2026 20:47 EET
diff --git a/STT_MIGRATION.md b/STT_MIGRATION.md
deleted file mode 100644
index 344c87e..0000000
--- a/STT_MIGRATION.md
+++ /dev/null
@@ -1,237 +0,0 @@
-# STT Migration: NeMo → ONNX Runtime
-
-## What Changed
-
-**Old Implementation** (`stt/`):
-- Used NVIDIA NeMo toolkit with PyTorch
-- Heavy memory usage (~4-5GB VRAM)
-- Complex dependency tree (NeMo, transformers, huggingface-hub conflicts)
-- Slow transcription (~2-3 seconds per utterance)
-- Custom VAD + FastAPI WebSocket server
-
-**New Implementation** (`stt-parakeet/`):
-- Uses `onnx-asr` library with ONNX Runtime
-- Optimized VRAM usage (~2-3GB VRAM)
-- Simple dependencies (onnxruntime-gpu, onnx-asr, numpy)
-- **Much faster transcription** (~0.5-1 second per utterance)
-- Clean architecture with modular ASR pipeline
-
-## Architecture
-
-```
-stt-parakeet/
-├── Dockerfile              # CUDA 12.1 + Python 3.11 + ONNX Runtime
-├── requirements-stt.txt    # Exact pinned dependencies
-├── asr/
-│   └── asr_pipeline.py    # ONNX ASR wrapper with GPU acceleration
-├── server/
-│   └── ws_server.py       # WebSocket server (port 8766)
-├── vad/
-│   └── silero_vad.py      # Voice Activity Detection
-└── models/                # Model cache (auto-downloaded)
-```
-
-## Docker Setup
-
-### Build
-```bash
-docker-compose build miku-stt
-```
-
-### Run
-```bash
-docker-compose up -d miku-stt
-```
-
-### Check Logs
-```bash
-docker logs -f miku-stt
-```
-
-### Verify CUDA
-```bash
-docker exec miku-stt python3.11 -c "import onnxruntime as ort; print('CUDA:', 'CUDAExecutionProvider' in ort.get_available_providers())"
-```
-
-## API Changes
-
-### Old Protocol (port 8001)
-```python
-# FastAPI with /ws/stt/{user_id} endpoint
-ws://localhost:8001/ws/stt/123456
-
-# Events:
-{
-  "type": "vad",
-  "event": "speech_start" | "speaking" | "speech_end",
-  "probability": 0.95
-}
-{
-  "type": "partial",
-  "text": "Hello",
-  "words": []
-}
-{
-  "type": "final",
-  "text": "Hello world",
-  "words": [{"word": "Hello", "start_time": 0.0, "end_time": 0.5}]
-}
-```
-
-### New Protocol (port 8766)
-```python
-# Direct WebSocket connection
-ws://localhost:8766
-
-# Send audio (binary):
-# - int16 PCM, 16kHz mono
-# - Send as raw bytes
-
-# Send commands (JSON):
-{"type": "final"}   # Trigger final transcription
-{"type": "reset"}   # Clear audio buffer
-
-# Receive transcripts:
-{
-  "type": "transcript",
-  "text": "Hello world",
-  "is_final": false  # Progressive transcription
-}
-{
-  "type": "transcript",
-  "text": "Hello world",
-  "is_final": true   # Final transcription after "final" command
-}
-```
-
-## Bot Integration Changes Needed
-
-### 1. Update WebSocket URL
-```python
-# Old
-ws://miku-stt:8000/ws/stt/{user_id}
-
-# New
-ws://miku-stt:8766
-```
-
-### 2. Update Message Format
-```python
-# Old: Send audio with metadata
-await websocket.send_bytes(audio_data)
-
-# New: Send raw audio bytes (same)
-await websocket.send(audio_data)  # bytes
-
-# Old: Listen for VAD events
-if msg["type"] == "vad":
-    # Handle VAD
-
-# New: No VAD events (handled internally)
-# Just send final command when user stops speaking
-await websocket.send(json.dumps({"type": "final"}))
-```
-
-### 3. Update Response Handling
-```python
-# Old
-if msg["type"] == "partial":
-    text = msg["text"]
-    words = msg["words"]
-    
-if msg["type"] == "final":
-    text = msg["text"]
-    words = msg["words"]
-
-# New
-if msg["type"] == "transcript":
-    text = msg["text"]
-    is_final = msg["is_final"]
-    # No word-level timestamps in ONNX version
-```
-
-## Performance Comparison
-
-| Metric | Old (NeMo) | New (ONNX) |
-|--------|-----------|-----------|
-| **VRAM Usage** | 4-5GB | 2-3GB |
-| **Transcription Speed** | 2-3s | 0.5-1s |
-| **Build Time** | ~10 min | ~5 min |
-| **Dependencies** | 50+ packages | 15 packages |
-| **GPU Utilization** | 60-70% | 85-95% |
-| **OOM Crashes** | Frequent | None |
-
-## Migration Steps
-
-1. ✅ Build new container: `docker-compose build miku-stt`
-2. ✅ Update bot WebSocket client (`bot/utils/stt_client.py`)
-3. ✅ Update voice receiver to send "final" command
-4. ⏳ Test transcription quality
-5. ⏳ Remove old `stt/` directory
-
-## Troubleshooting
-
-### Issue 1: CUDA Not Working (Falling Back to CPU)
-**Symptoms:** 
-```
-[E:onnxruntime:Default] Failed to load library libonnxruntime_providers_cuda.so 
-with error: libcudnn.so.9: cannot open shared object file
-```
-
-**Cause:** ONNX Runtime GPU requires cuDNN 9, but CUDA 12.1 base image only has cuDNN 8.
-
-**Fix:** Update Dockerfile base image:
-```dockerfile
-FROM nvidia/cuda:12.6.2-cudnn-runtime-ubuntu22.04
-```
-
-**Verify:**
-```bash
-docker logs miku-stt 2>&1 | grep "Providers"
-# Should show: CUDAExecutionProvider (not just CPUExecutionProvider)
-```
-
-### Issue 2: Connection Refused (Port 8000)
-**Symptoms:**
-```
-ConnectionRefusedError: [Errno 111] Connect call failed ('172.20.0.5', 8000)
-```
-
-**Cause:** New ONNX server runs on port 8766, not 8000.
-
-**Fix:** Update `bot/utils/stt_client.py`:
-```python
-stt_url: str = "ws://miku-stt:8766/ws/stt"  # Changed from 8000
-```
-
-### Issue 3: Protocol Mismatch
-**Symptoms:** Bot doesn't receive transcripts, or transcripts are empty.
-
-**Cause:** New ONNX server uses different WebSocket protocol.
-
-**Old Protocol (NeMo):** Automatic VAD-triggered `partial` and `final` events
-**New Protocol (ONNX):** Manual control with `{"type": "final"}` command
-
-**Fix:** 
-- Updated `stt_client._handle_event()` to handle `transcript` type with `is_final` flag
-- Added `send_final()` method to request final transcription
-- Bot should call `stt_client.send_final()` when user stops speaking
-
-## Rollback Plan
-
-If needed, revert docker-compose.yml:
-```yaml
-miku-stt:
-  build:
-    context: ./stt
-    dockerfile: Dockerfile.stt
-  # ... rest of old config
-```
-
-## Notes
-
-- Model downloads on first run (~600MB)
-- Models cached in `./stt-parakeet/models/`
-- No word-level timestamps (ONNX model doesn't provide them)
-- VAD handled internally (no need for external VAD integration)
-- Uses same GPU (GTX 1660, device 0) as before
diff --git a/STT_VOICE_TESTING.md b/STT_VOICE_TESTING.md
deleted file mode 100644
index 0bcabcc..0000000
--- a/STT_VOICE_TESTING.md
+++ /dev/null
@@ -1,266 +0,0 @@
-# STT Voice Testing Guide
-
-## Phase 4B: Bot-Side STT Integration - COMPLETE ✅
-
-All code has been deployed to containers. Ready for testing!
-
-## Architecture Overview
-
-```
-Discord Voice (User) → Opus 48kHz stereo
-                ↓
-        VoiceReceiver.write()
-                ↓
-        Opus decode → Stereo-to-mono → Resample to 16kHz
-                ↓
-        STTClient.send_audio() → WebSocket
-                ↓
-        miku-stt:8001 (Silero VAD + Faster-Whisper)
-                ↓
-        JSON events (vad, partial, final, interruption)
-                ↓
-        VoiceReceiver callbacks → voice_manager
-                ↓
-        on_final_transcript() → _generate_voice_response()
-                ↓
-        LLM streaming → TTS tokens → Audio playback
-```
-
-## New Voice Commands
-
-### 1. Start Listening
-```
-!miku listen
-```
-- Starts listening to **your** voice in the current voice channel
-- You must be in the same channel as Miku
-- Miku will transcribe your speech and respond with voice
-
-```
-!miku listen @username
-```
-- Start listening to a specific user's voice
-- Useful for moderators or testing with multiple users
-
-### 2. Stop Listening
-```
-!miku stop-listening
-```
-- Stop listening to your voice
-- Miku will no longer transcribe or respond to your speech
-
-```
-!miku stop-listening @username
-```
-- Stop listening to a specific user
-
-## Testing Procedure
-
-### Test 1: Basic STT Connection
-1. Join a voice channel
-2. `!miku join` - Miku joins your channel
-3. `!miku listen` - Start listening to your voice
-4. Check bot logs for "Started listening to user"
-5. Check STT logs: `docker logs miku-stt --tail 50`
-   - Should show: "WebSocket connection from user {user_id}"
-   - Should show: "Session started for user {user_id}"
-
-### Test 2: VAD Detection
-1. After `!miku listen`, speak into your microphone
-2. Say something like: "Hello Miku, can you hear me?"
-3. Check STT logs for VAD events:
-   ```
-   [DEBUG] VAD: speech_start probability=0.85
-   [DEBUG] VAD: speaking probability=0.92
-   [DEBUG] VAD: speech_end probability=0.15
-   ```
-4. Bot logs should show: "VAD event for user {id}: speech_start/speaking/speech_end"
-
-### Test 3: Transcription
-1. Speak clearly into microphone: "Hey Miku, tell me a joke"
-2. Watch bot logs for:
-   - "Partial transcript from user {id}: Hey Miku..."
-   - "Final transcript from user {id}: Hey Miku, tell me a joke"
-3. Miku should respond with LLM-generated speech
-4. Check channel for: "🎤 Miku: *[her response]*"
-
-### Test 4: Interruption Detection
-1. `!miku listen`
-2. `!miku say Tell me a very long story about your favorite song`
-3. While Miku is speaking, start talking yourself
-4. Speak loudly enough to trigger VAD (probability > 0.7)
-5. Expected behavior:
-   - Miku's audio should stop immediately
-   - Bot logs: "User {id} interrupted Miku (probability={prob})"
-   - STT logs: "Interruption detected during TTS playback"
-   - RVC logs: "Interrupted: Flushed {N} ZMQ chunks"
-
-### Test 5: Multi-User (if available)
-1. Have two users join voice channel
-2. `!miku listen @user1` - Listen to first user
-3. `!miku listen @user2` - Listen to second user
-4. Both users speak separately
-5. Verify Miku responds to each user individually
-6. Check STT logs for multiple active sessions
-
-## Logs to Monitor
-
-### Bot Logs
-```bash
-docker logs -f miku-bot | grep -E "(listen|STT|transcript|interrupt)"
-```
-Expected output:
-```
-[INFO] Started listening to user 123456789 (username)
-[DEBUG] VAD event for user 123456789: speech_start
-[DEBUG] Partial transcript from user 123456789: Hello Miku...
-[INFO] Final transcript from user 123456789: Hello Miku, how are you?
-[INFO] User 123456789 interrupted Miku (probability=0.82)
-```
-
-### STT Logs
-```bash
-docker logs -f miku-stt
-```
-Expected output:
-```
-[INFO] WebSocket connection from user_123456789
-[INFO] Session started for user 123456789
-[DEBUG] Received 320 audio samples from user_123456789
-[DEBUG] VAD speech_start: probability=0.87
-[INFO] Transcribing audio segment (duration=2.5s)
-[INFO] Final transcript: "Hello Miku, how are you?"
-```
-
-### RVC Logs (for interruption)
-```bash
-docker logs -f miku-rvc-api | grep -i interrupt
-```
-Expected output:
-```
-[INFO] Interrupted: Flushed 15 ZMQ chunks, cleared 48000 RVC buffer samples
-```
-
-## Component Status
-
-### ✅ Completed
-- [x] STT container running (miku-stt:8001)
-- [x] Silero VAD on CPU with chunk buffering
-- [x] Faster-Whisper on GTX 1660 (1.3GB VRAM)
-- [x] STTClient WebSocket client
-- [x] VoiceReceiver Discord audio sink
-- [x] VoiceSession STT integration
-- [x] listen/stop-listening commands
-- [x] /interrupt endpoint in RVC API
-- [x] LLM response generation from transcripts
-- [x] Interruption detection and cancellation
-
-### ⏳ Pending Testing
-- [ ] Basic STT connection test
-- [ ] VAD speech detection test
-- [ ] End-to-end transcription test
-- [ ] LLM voice response test
-- [ ] Interruption cancellation test
-- [ ] Multi-user testing (if available)
-
-### 🔧 Configuration Tuning (after testing)
-- VAD sensitivity (currently threshold=0.5)
-- VAD timing (min_speech=250ms, min_silence=500ms)
-- Interruption threshold (currently 0.7)
-- Whisper beam size and patience
-- LLM streaming chunk size
-
-## API Endpoints
-
-### STT Container (port 8001)
-- WebSocket: `ws://localhost:8001/ws/stt/{user_id}`
-- Health: `http://localhost:8001/health`
-
-### RVC Container (port 8765)
-- WebSocket: `ws://localhost:8765/ws/stream`
-- Interrupt: `http://localhost:8765/interrupt` (POST)
-- Health: `http://localhost:8765/health`
-
-## Troubleshooting
-
-### No audio received from Discord
-- Check bot logs for "write() called with data"
-- Verify user is in same voice channel as Miku
-- Check Discord permissions (View Channel, Connect, Speak)
-
-### VAD not detecting speech
-- Check chunk buffer accumulation in STT logs
-- Verify audio format: PCM int16, 16kHz mono
-- Try speaking louder or more clearly
-- Check VAD threshold (may need adjustment)
-
-### Transcription empty or gibberish
-- Verify Whisper model loaded (check STT startup logs)
-- Check GPU VRAM usage: `nvidia-smi`
-- Ensure audio segments are at least 1-2 seconds long
-- Try speaking more clearly with less background noise
-
-### Interruption not working
-- Verify Miku is actually speaking (check miku_speaking flag)
-- Check VAD probability in logs (must be > 0.7)
-- Verify /interrupt endpoint returns success
-- Check RVC logs for flushed chunks
-
-### Multiple users causing issues
-- Check STT logs for per-user session management
-- Verify each user has separate STTClient instance
-- Check for resource contention on GTX 1660
-
-## Next Steps After Testing
-
-### Phase 4C: LLM KV Cache Precomputation
-- Use partial transcripts to start LLM generation early
-- Precompute KV cache for common phrases
-- Reduce latency between speech end and response start
-
-### Phase 4D: Multi-User Refinement
-- Queue management for multiple simultaneous speakers
-- Priority system for interruptions
-- Resource allocation for multiple Whisper requests
-
-### Phase 4E: Latency Optimization
-- Profile each stage of the pipeline
-- Optimize audio chunk sizes
-- Reduce WebSocket message overhead
-- Tune Whisper beam search parameters
-- Implement VAD lookahead for quicker detection
-
-## Hardware Utilization
-
-### Current Allocation
-- **AMD RX 6800**: LLaMA text models (idle during listen/speak)
-- **GTX 1660**: 
-  - Listen phase: Faster-Whisper (1.3GB VRAM)
-  - Speak phase: Soprano TTS + RVC (time-multiplexed)
-- **CPU**: Silero VAD, audio preprocessing
-
-### Expected Performance
-- VAD latency: <50ms (CPU processing)
-- Transcription latency: 200-500ms (Whisper inference)
-- LLM streaming: 20-30 tokens/sec (RX 6800)
-- TTS synthesis: Real-time (GTX 1660)
-- Total latency (speech → response): 1-2 seconds
-
-## Testing Checklist
-
-Before marking Phase 4B as complete:
-
-- [ ] Test basic STT connection with `!miku listen`
-- [ ] Verify VAD detects speech start/end correctly
-- [ ] Confirm transcripts are accurate and complete
-- [ ] Test LLM voice response generation works
-- [ ] Verify interruption cancels TTS playback
-- [ ] Check multi-user handling (if possible)
-- [ ] Verify resource cleanup on `!miku stop-listening`
-- [ ] Test edge cases (silence, background noise, overlapping speech)
-- [ ] Profile latencies at each stage
-- [ ] Document any configuration tuning needed
-
----
-
-**Status**: Code deployed, ready for user testing! 🎤🤖
diff --git a/VOICE_CALL_AUTOMATION.md b/VOICE_CALL_AUTOMATION.md
deleted file mode 100644
index 63aa7b6..0000000
--- a/VOICE_CALL_AUTOMATION.md
+++ /dev/null
@@ -1,261 +0,0 @@
-# Voice Call Automation System
-
-## Overview
-
-Miku now has an automated voice call system that can be triggered from the web UI. This replaces the manual command-based voice chat flow with a seamless, immersive experience.
-
-## Features
-
-### 1. Voice Debug Mode Toggle
-- **Environment Variable**: `VOICE_DEBUG_MODE` (default: `false`)
-- When `true`: Shows manual commands, text notifications, transcripts in chat
-- When `false` (field deployment): Silent operation, no command notifications
-
-### 2. Automated Voice Call Flow
-
-#### Initiation (Web UI → API)
-```
-POST /api/voice/call
-{
-  "user_id": 123456789,
-  "voice_channel_id": 987654321
-}
-```
-
-#### What Happens:
-1. **Container Startup**: Starts `miku-stt` and `miku-rvc-api` containers
-2. **Warmup Wait**: Monitors containers until fully warmed up
-   - STT: WebSocket connection check (30s timeout)
-   - TTS: Health endpoint check for `warmed_up: true` (60s timeout)
-3. **Join Voice Channel**: Creates voice session with full resource locking
-4. **Send DM**: Generates personalized LLM invitation and sends with voice channel invite link
-5. **Auto-Listen**: Automatically starts listening when user joins
-
-#### User Join Detection:
-- Monitors `on_voice_state_update` events
-- When target user joins:
-  - Marks `user_has_joined = True`
-  - Cancels 30min timeout
-  - Auto-starts STT for that user
-
-#### Auto-Leave After User Disconnect:
-- **45 second timer** starts when user leaves voice channel
-- If user doesn't rejoin within 45s:
-  - Ends voice session
-  - Stops STT and TTS containers
-  - Releases all resources
-  - Returns to normal operation
-- If user rejoins before 45s, timer is cancelled
-
-#### 30-Minute Join Timeout:
-- If user never joins within 30 minutes:
-  - Ends voice session
-  - Stops containers
-  - Sends timeout DM: "Aww, I guess you couldn't make it to voice chat... Maybe next time! 💙"
-
-### 3. Container Management
-
-**File**: `bot/utils/container_manager.py`
-
-#### Methods:
-- `start_voice_containers()`: Starts STT & TTS, waits for warmup
-- `stop_voice_containers()`: Stops both containers
-- `are_containers_running()`: Check container status
-- `_wait_for_stt_warmup()`: WebSocket connection check
-- `_wait_for_tts_warmup()`: Health endpoint check
-
-#### Warmup Detection:
-```python
-# STT Warmup: Try WebSocket connection
-ws://miku-stt:8765
-
-# TTS Warmup: Check health endpoint
-GET http://miku-rvc-api:8765/health
-Response: {"status": "ready", "warmed_up": true}
-```
-
-### 4. Voice Session Tracking
-
-**File**: `bot/utils/voice_manager.py`
-
-#### New VoiceSession Fields:
-```python
-call_user_id: Optional[int]  # User ID that was called
-call_timeout_task: Optional[asyncio.Task]  # 30min timeout
-user_has_joined: bool  # Track if user joined
-auto_leave_task: Optional[asyncio.Task]  # 45s auto-leave
-user_leave_time: Optional[float]  # When user left
-```
-
-#### Methods:
-- `on_user_join(user_id)`: Handle user joining voice channel
-- `on_user_leave(user_id)`: Start 45s auto-leave timer
-- `_auto_leave_after_user_disconnect()`: Execute auto-leave
-
-### 5. LLM Context Update
-
-Miku's voice chat prompt now includes:
-```
-NOTE: You will automatically disconnect 45 seconds after {user.name} leaves the voice channel,
-so you can mention this if asked about leaving
-```
-
-### 6. Debug Mode Integration
-
-#### With `VOICE_DEBUG_MODE=true`:
-- Shows "🎤 User said: ..." in text chat
-- Shows "💬 Miku: ..." responses
-- Shows interruption messages
-- Manual commands work (`!miku join`, `!miku listen`, etc.)
-
-#### With `VOICE_DEBUG_MODE=false` (field deployment):
-- No text notifications
-- No command outputs
-- Silent operation
-- Only log files show activity
-
-## API Endpoint
-
-### POST `/api/voice/call`
-
-**Request Body**:
-```json
-{
-  "user_id": 123456789,
-  "voice_channel_id": 987654321
-}
-```
-
-**Success Response**:
-```json
-{
-  "success": true,
-  "user_id": 123456789,
-  "channel_id": 987654321,
-  "invite_url": "https://discord.gg/abc123"
-}
-```
-
-**Error Response**:
-```json
-{
-  "success": false,
-  "error": "Failed to start voice containers"
-}
-```
-
-## File Changes
-
-### New Files:
-1. `bot/utils/container_manager.py` - Docker container management
-2. `VOICE_CALL_AUTOMATION.md` - This documentation
-
-### Modified Files:
-1. `bot/globals.py` - Added `VOICE_DEBUG_MODE` flag
-2. `bot/api.py` - Added `/api/voice/call` endpoint and timeout handler
-3. `bot/bot.py` - Added `on_voice_state_update` event handler
-4. `bot/utils/voice_manager.py`:
-   - Added call tracking fields to VoiceSession
-   - Added `on_user_join()` and `on_user_leave()` methods
-   - Added `_auto_leave_after_user_disconnect()` method
-   - Updated LLM prompt with auto-disconnect context
-   - Gated debug messages behind `VOICE_DEBUG_MODE`
-5. `bot/utils/voice_receiver.py` - Removed Discord VAD events (rely on RealtimeSTT only)
-
-## Testing Checklist
-
-### Web UI Integration:
-- [ ] Create voice call trigger UI with user ID and channel ID inputs
-- [ ] Display call status (starting containers, waiting for warmup, joined VC, waiting for user)
-- [ ] Show timeout countdown
-- [ ] Handle errors gracefully
-
-### Flow Testing:
-- [ ] Test successful call flow (containers start → warmup → join → DM → user joins → conversation → user leaves → 45s timer → auto-leave → containers stop)
-- [ ] Test 30min timeout (user never joins)
-- [ ] Test user rejoin within 45s (cancels auto-leave)
-- [ ] Test container failure handling
-- [ ] Test warmup timeout handling
-- [ ] Test DM failure (should continue anyway)
-
-### Debug Mode:
-- [ ] Test with `VOICE_DEBUG_MODE=true` (should see all notifications)
-- [ ] Test with `VOICE_DEBUG_MODE=false` (should be silent)
-
-## Environment Variables
-
-Add to `.env` or `docker-compose.yml`:
-```bash
-VOICE_DEBUG_MODE=false  # Set to true for debugging
-```
-
-## Next Steps
-
-1. **Web UI**: Create voice call interface with:
-   - User ID input
-   - Voice channel ID dropdown (fetch from Discord)
-   - "Call User" button
-   - Status display
-   - Active call management
-
-2. **Monitoring**: Add voice call metrics:
-   - Call duration
-   - User join time
-   - Auto-leave triggers
-   - Container startup times
-
-3. **Enhancements**:
-   - Multiple simultaneous calls (different channels)
-   - Call history logging
-   - User preferences (auto-answer, DND mode)
-   - Scheduled voice calls
-
-## Technical Notes
-
-### Container Warmup Times:
-- **STT** (`miku-stt`): ~5-15 seconds (model loading)
-- **TTS** (`miku-rvc-api`): ~30-60 seconds (RVC model loading, synthesis warmup)
-- **Total**: ~35-75 seconds from API call to ready
-
-### Resource Management:
-- Voice sessions use `VoiceSessionManager` singleton
-- Only one voice session active at a time
-- Full resource locking during voice:
-  - AMD GPU for text inference
-  - Vision model blocked
-  - Image generation disabled
-  - Bipolar mode disabled
-  - Autonomous engine paused
-
-### Cleanup Guarantees:
-- 45s auto-leave ensures no orphaned sessions
-- 30min timeout prevents indefinite container running
-- All cleanup paths stop containers
-- Voice session end releases all resources
-
-## Troubleshooting
-
-### Containers won't start:
-- Check Docker daemon status
-- Check `docker compose ps` for existing containers
-- Check logs: `docker logs miku-stt` / `docker logs miku-rvc-api`
-
-### Warmup timeout:
-- STT: Check WebSocket is accepting connections on port 8765
-- TTS: Check health endpoint returns `{"warmed_up": true}`
-- Increase timeout values if needed (slow hardware)
-
-### User never joins:
-- Verify invite URL is valid
-- Check user has permission to join voice channel
-- Verify DM was delivered (may be blocked)
-
-### Auto-leave not triggering:
-- Check `on_voice_state_update` events are firing
-- Verify user ID matches `call_user_id`
-- Check logs for timer creation/cancellation
-
-### Containers not stopping:
-- Manual stop: `docker compose stop miku-stt miku-rvc-api`
-- Check for orphaned containers: `docker ps`
-- Force remove: `docker rm -f miku-stt miku-rvc-api`
diff --git a/VOICE_CHAT_CONTEXT.md b/VOICE_CHAT_CONTEXT.md
deleted file mode 100644
index 55a8d8f..0000000
--- a/VOICE_CHAT_CONTEXT.md
+++ /dev/null
@@ -1,225 +0,0 @@
-# Voice Chat Context System
-
-## Implementation Complete ✅
-
-Added comprehensive voice chat context to give Miku awareness of the conversation environment.
-
----
-
-## Features
-
-### 1. Voice-Aware System Prompt
-Miku now knows she's in a voice chat and adjusts her behavior:
-- ✅ Aware she's speaking via TTS
-- ✅ Knows who she's talking to (user names included)
-- ✅ Understands responses will be spoken aloud
-- ✅ Instructed to keep responses short (1-3 sentences)
-- ✅ **CRITICAL: Instructed to only use English** (TTS can't handle Japanese well)
-
-### 2. Conversation History (Last 8 Exchanges)
-- Stores last 16 messages (8 user + 8 assistant)
-- Maintains context across multiple voice interactions
-- Automatically trimmed to keep memory manageable
-- Each message includes username for multi-user context
-
-### 3. Personality Integration
-- Loads `miku_lore.txt` - Her background, personality, likes/dislikes
-- Loads `miku_prompt.txt` - Core personality instructions
-- Combines with voice-specific instructions
-- Maintains character consistency
-
-### 4. Reduced Log Spam
-- Set voice_recv logger to CRITICAL level
-- Suppresses routine CryptoErrors and RTCP packets
-- Only shows actual critical errors
-
----
-
-## System Prompt Structure
-
-```
-[miku_prompt.txt content]
-
-[miku_lore.txt content]
-
-VOICE CHAT CONTEXT:
-- You are currently in a voice channel speaking with {user.name} and others
-- Your responses will be spoken aloud via text-to-speech
-- Keep responses SHORT and CONVERSATIONAL (1-3 sentences max)
-- Speak naturally as if having a real-time voice conversation
-- IMPORTANT: Only respond in ENGLISH! The TTS system cannot handle Japanese or other languages well.
-- Be expressive and use casual language, but stay in character as Miku
-
-Remember: This is a live voice conversation, so be concise and engaging!
-```
-
----
-
-## Conversation Flow
-
-```
-User speaks → STT transcribes → Add to history
-                                      ↓
-                              [System Prompt]
-                              [Last 8 exchanges]
-                              [Current user message]
-                                      ↓
-                                  LLM generates
-                                      ↓
-                              Add response to history
-                                      ↓
-                              Stream to TTS → Speak
-```
-
----
-
-## Message History Format
-
-```python
-conversation_history = [
-    {"role": "user", "content": "koko210: Hey Miku, how are you?"},
-    {"role": "assistant", "content": "Hey koko210! I'm doing great, thanks for asking!"},
-    {"role": "user", "content": "koko210: Can you sing something?"},
-    {"role": "assistant", "content": "I'd love to! What song would you like to hear?"},
-    # ... up to 16 messages total (8 exchanges)
-]
-```
-
----
-
-## Configuration
-
-### Conversation History Limit
-**Current**: 16 messages (8 exchanges)
-
-To adjust, edit `voice_manager.py`:
-```python
-# Keep only last 8 exchanges (16 messages = 8 user + 8 assistant)
-if len(self.conversation_history) > 16:
-    self.conversation_history = self.conversation_history[-16:]
-```
-
-**Recommendations**:
-- **8 exchanges**: Good balance (current setting)
-- **12 exchanges**: More context, slightly more tokens
-- **4 exchanges**: Minimal context, faster responses
-
-### Response Length
-**Current**: max_tokens=200
-
-To adjust:
-```python
-payload = {
-    "max_tokens": 200  # Change this
-}
-```
-
----
-
-## Language Enforcement
-
-### Why English-Only?
-The RVC TTS system is trained on English audio and struggles with:
-- Japanese characters (even though Miku is Japanese!)
-- Special characters
-- Mixed language text
-- Non-English phonetics
-
-### Implementation
-The system prompt explicitly tells Miku:
-> **IMPORTANT: Only respond in ENGLISH! The TTS system cannot handle Japanese or other languages well.**
-
-This is reinforced in every voice chat interaction.
-
----
-
-## Testing
-
-### Test 1: Basic Conversation
-```
-User: "Hey Miku!"
-Miku: "Hi there! Great to hear from you!" (should be in English)
-User: "How are you doing?"
-Miku: "I'm doing wonderful! How about you?" (remembers previous exchange)
-```
-
-### Test 2: Context Retention
-Have a multi-turn conversation and verify Miku remembers:
-- Previous topics discussed
-- User names
-- Conversation flow
-
-### Test 3: Response Length
-Verify responses are:
-- Short (1-3 sentences)
-- Conversational
-- Not truncated mid-sentence
-
-### Test 4: Language Enforcement
-Try asking in Japanese or requesting Japanese response:
-- Miku should politely respond in English
-- Should explain she needs to use English for voice chat
-
----
-
-## Monitoring
-
-### Check Conversation History
-```bash
-# Add debug logging to voice_manager.py to see history
-logger.debug(f"Conversation history: {self.conversation_history}")
-```
-
-### Check System Prompt
-```bash
-docker exec miku-bot cat /app/miku_prompt.txt
-docker exec miku-bot cat /app/miku_lore.txt
-```
-
-### Monitor Responses
-```bash
-docker logs -f miku-bot | grep "Voice response complete"
-```
-
----
-
-## Files Modified
-
-1. **bot/bot.py**
-   - Changed voice_recv logger level from WARNING to CRITICAL
-   - Suppresses CryptoError spam
-
-2. **bot/utils/voice_manager.py**
-   - Added `conversation_history` to `VoiceSession.__init__()`
-   - Updated `_generate_voice_response()` to load lore files
-   - Built comprehensive voice-aware system prompt
-   - Implemented conversation history tracking (last 8 exchanges)
-   - Added English-only instruction
-   - Saves both user and assistant messages to history
-
----
-
-## Benefits
-
-✅ **Better Context**: Miku remembers previous exchanges  
-✅ **Cleaner Logs**: No more CryptoError spam  
-✅ **Natural Responses**: Knows she's in voice chat, responds appropriately  
-✅ **Language Consistency**: Enforces English for TTS compatibility  
-✅ **Personality Intact**: Still loads lore and personality files  
-✅ **User Awareness**: Knows who she's talking to  
-
----
-
-## Next Steps
-
-1. **Test thoroughly** with multi-turn conversations
-2. **Adjust history length** if needed (currently 8 exchanges)
-3. **Fine-tune response length** based on TTS performance
-4. **Add conversation reset** command if needed (e.g., `!miku reset`)
-5. **Consider adding** conversation summaries for very long sessions
-
----
-
-**Status**: ✅ **DEPLOYED AND READY FOR TESTING**
-
-Miku now has full context awareness in voice chat with personality, conversation history, and language enforcement!
diff --git a/VOICE_TO_VOICE_REFERENCE.md b/VOICE_TO_VOICE_REFERENCE.md
deleted file mode 100644
index e9b1dca..0000000
--- a/VOICE_TO_VOICE_REFERENCE.md
+++ /dev/null
@@ -1,323 +0,0 @@
-# Voice-to-Voice Quick Reference
-
-## Complete Pipeline Status ✅
-
-All phases complete and deployed!
-
-## Phase Completion Status
-
-### ✅ Phase 1: Voice Connection (COMPLETE)
-- Discord voice channel connection
-- Audio playback via discord.py
-- Resource management and cleanup
-
-### ✅ Phase 2: Audio Streaming (COMPLETE)
-- Soprano TTS server (GTX 1660)
-- RVC voice conversion
-- Real-time streaming via WebSocket
-- Token-by-token synthesis
-
-### ✅ Phase 3: Text-to-Voice (COMPLETE)
-- LLaMA text generation (AMD RX 6800)
-- Streaming token pipeline
-- TTS integration with `!miku say`
-- Natural conversation flow
-
-### ✅ Phase 4A: STT Container (COMPLETE)
-- Silero VAD on CPU
-- Faster-Whisper on GTX 1660
-- WebSocket server at port 8001
-- Per-user session management
-- Chunk buffering for VAD
-
-### ✅ Phase 4B: Bot STT Integration (COMPLETE - READY FOR TESTING)
-- Discord audio capture
-- Opus decode + resampling
-- STT client WebSocket integration
-- Voice commands: `!miku listen`, `!miku stop-listening`
-- LLM voice response generation
-- Interruption detection and cancellation
-- `/interrupt` endpoint in RVC API
-
-## Quick Start Commands
-
-### Setup
-```bash
-!miku join              # Join your voice channel
-!miku listen            # Start listening to your voice
-```
-
-### Usage
-- **Speak** into your microphone
-- Miku will **transcribe** your speech
-- Miku will **respond** with voice
-- **Interrupt** her by speaking while she's talking
-
-### Teardown
-```bash
-!miku stop-listening    # Stop listening to your voice
-!miku leave             # Leave voice channel
-```
-
-## Architecture Diagram
-
-```
-┌─────────────────────────────────────────────────────────────────┐
-│                         USER INPUT                              │
-└─────────────────────────────────────────────────────────────────┘
-                              │
-                              │ Discord Voice (Opus 48kHz)
-                              ▼
-┌─────────────────────────────────────────────────────────────────┐
-│                    miku-bot Container                           │
-│  ┌───────────────────────────────────────────────────────────┐ │
-│  │ VoiceReceiver (discord.sinks.Sink)                        │ │
-│  │  - Opus decode → PCM                                      │ │
-│  │  - Stereo → Mono                                          │ │
-│  │  - Resample 48kHz → 16kHz                                 │ │
-│  └─────────────────┬─────────────────────────────────────────┘ │
-│                    │ PCM int16, 16kHz, 20ms chunks              │
-│  ┌─────────────────▼─────────────────────────────────────────┐ │
-│  │ STTClient (WebSocket)                                     │ │
-│  │  - Sends audio to miku-stt                                │ │
-│  │  - Receives VAD events, transcripts                       │ │
-│  └─────────────────┬─────────────────────────────────────────┘ │
-└────────────────────┼───────────────────────────────────────────┘
-                     │ ws://miku-stt:8001/ws/stt/{user_id}
-                     ▼
-┌─────────────────────────────────────────────────────────────────┐
-│                    miku-stt Container                           │
-│  ┌───────────────────────────────────────────────────────────┐ │
-│  │ VADProcessor (Silero VAD 5.1.2)         [CPU]            │ │
-│  │  - Chunk buffering (512 samples min)                      │ │
-│  │  - Speech detection (threshold=0.5)                       │ │
-│  │  - Events: speech_start, speaking, speech_end             │ │
-│  └─────────────────┬─────────────────────────────────────────┘ │
-│                    │ Audio segments                             │
-│  ┌─────────────────▼─────────────────────────────────────────┐ │
-│  │ WhisperTranscriber (Faster-Whisper 1.2.1) [GTX 1660]    │ │
-│  │  - Model: small (1.3GB VRAM)                              │ │
-│  │  - Transcribes speech segments                            │ │
-│  │  - Returns: partial & final transcripts                   │ │
-│  └─────────────────┬─────────────────────────────────────────┘ │
-└────────────────────┼───────────────────────────────────────────┘
-                     │ JSON events via WebSocket
-                     ▼
-┌─────────────────────────────────────────────────────────────────┐
-│                    miku-bot Container                           │
-│  ┌───────────────────────────────────────────────────────────┐ │
-│  │ voice_manager.py Callbacks                                │ │
-│  │  - on_vad_event()         → Log VAD states                │ │
-│  │  - on_partial_transcript() → Show typing indicator        │ │
-│  │  - on_final_transcript()   → Generate LLM response        │ │
-│  │  - on_interruption()       → Cancel TTS playback          │ │
-│  └─────────────────┬─────────────────────────────────────────┘ │
-│                    │ Final transcript text                      │
-│  ┌─────────────────▼─────────────────────────────────────────┐ │
-│  │ _generate_voice_response()                                │ │
-│  │  - Build LLM prompt with conversation history             │ │
-│  │  - Stream LLM response                                    │ │
-│  │  - Send tokens to TTS                                     │ │
-│  └─────────────────┬─────────────────────────────────────────┘ │
-└────────────────────┼───────────────────────────────────────────┘
-                     │ HTTP streaming to LLaMA server
-                     ▼
-┌─────────────────────────────────────────────────────────────────┐
-│              llama-cpp-server (AMD RX 6800)                     │
-│  - Streaming text generation                                   │
-│  - 20-30 tokens/sec                                            │
-│  - Returns: {"delta": {"content": "token"}}                    │
-└─────────────────┬───────────────────────────────────────────────┘
-                  │ Token stream
-                  ▼
-┌─────────────────────────────────────────────────────────────────┐
-│                    miku-bot Container                           │
-│  ┌───────────────────────────────────────────────────────────┐ │
-│  │ audio_source.send_token()                                 │ │
-│  │  - Buffers tokens                                         │ │
-│  │  - Sends to RVC WebSocket                                 │ │
-│  └─────────────────┬─────────────────────────────────────────┘ │
-└────────────────────┼───────────────────────────────────────────┘
-                     │ ws://miku-rvc-api:8765/ws/stream
-                     ▼
-┌─────────────────────────────────────────────────────────────────┐
-│                 miku-rvc-api Container                          │
-│  ┌───────────────────────────────────────────────────────────┐ │
-│  │ Soprano TTS Server (miku-soprano-tts)    [GTX 1660]      │ │
-│  │  - Text → Audio synthesis                                 │ │
-│  │  - 32kHz output                                           │ │
-│  └─────────────────┬─────────────────────────────────────────┘ │
-│                    │ Raw audio via ZMQ                          │
-│  ┌─────────────────▼─────────────────────────────────────────┐ │
-│  │ RVC Voice Conversion                     [GTX 1660]      │ │
-│  │  - Voice cloning & pitch shifting                         │ │
-│  │  - 48kHz output                                           │ │
-│  └─────────────────┬─────────────────────────────────────────┘ │
-└────────────────────┼───────────────────────────────────────────┘
-                     │ PCM float32, 48kHz
-                     ▼
-┌─────────────────────────────────────────────────────────────────┐
-│                    miku-bot Container                           │
-│  ┌───────────────────────────────────────────────────────────┐ │
-│  │ discord.VoiceClient                                       │ │
-│  │  - Plays audio in voice channel                           │ │
-│  │  - Can be interrupted by user speech                      │ │
-│  └───────────────────────────────────────────────────────────┘ │
-└─────────────────────────────────────────────────────────────────┘
-                              │
-                              ▼
-┌─────────────────────────────────────────────────────────────────┐
-│                       USER OUTPUT                               │
-│                   (Miku's voice response)                       │
-└─────────────────────────────────────────────────────────────────┘
-```
-
-## Interruption Flow
-
-```
-User speaks during Miku's TTS
-         │
-         ▼
-VAD detects speech (probability > 0.7)
-         │
-         ▼
-STT sends interruption event
-         │
-         ▼
-on_user_interruption() callback
-         │
-         ▼
-_cancel_tts() → voice_client.stop()
-         │
-         ▼
-POST http://miku-rvc-api:8765/interrupt
-         │
-         ▼
-Flush ZMQ socket + clear RVC buffers
-         │
-         ▼
-Miku stops speaking, ready for new input
-```
-
-## Hardware Utilization
-
-### Listen Phase (User Speaking)
-- **CPU**: Silero VAD processing
-- **GTX 1660**: Faster-Whisper transcription (1.3GB VRAM)
-- **AMD RX 6800**: Idle
-
-### Think Phase (LLM Generation)
-- **CPU**: Idle
-- **GTX 1660**: Idle
-- **AMD RX 6800**: LLaMA inference (20-30 tokens/sec)
-
-### Speak Phase (Miku Responding)
-- **CPU**: Silero VAD monitoring for interruption
-- **GTX 1660**: Soprano TTS + RVC synthesis
-- **AMD RX 6800**: Idle
-
-## Performance Metrics
-
-### Expected Latencies
-| Stage                    | Latency      |
-|--------------------------|--------------|
-| Discord audio capture    | ~20ms        |
-| Opus decode + resample   | <10ms        |
-| VAD processing           | <50ms        |
-| Whisper transcription    | 200-500ms    |
-| LLM token generation     | 33-50ms/tok  |
-| TTS synthesis            | Real-time    |
-| **Total (speech → response)** | **1-2s** |
-
-### VRAM Usage
-| GPU         | Component      | VRAM      |
-|-------------|----------------|-----------|
-| AMD RX 6800 | LLaMA 8B Q4    | ~5.5GB    |
-| GTX 1660    | Whisper small  | 1.3GB     |
-| GTX 1660    | Soprano + RVC  | ~3GB      |
-
-## Key Files
-
-### Bot Container
-- `bot/utils/stt_client.py` - WebSocket client for STT
-- `bot/utils/voice_receiver.py` - Discord audio sink
-- `bot/utils/voice_manager.py` - Voice session with STT integration
-- `bot/commands/voice.py` - Voice commands including listen/stop-listening
-
-### STT Container
-- `stt/vad_processor.py` - Silero VAD with chunk buffering
-- `stt/whisper_transcriber.py` - Faster-Whisper transcription
-- `stt/stt_server.py` - FastAPI WebSocket server
-
-### RVC Container
-- `soprano_to_rvc/soprano_rvc_api.py` - TTS + RVC pipeline with /interrupt endpoint
-
-## Configuration Files
-
-### docker-compose.yml
-- Network: `miku-network` (all containers)
-- Ports:
-  - miku-bot: 8081 (API)
-  - miku-rvc-api: 8765 (TTS)
-  - miku-stt: 8001 (STT)
-  - llama-cpp-server: 8080 (LLM)
-
-### VAD Settings (stt/vad_processor.py)
-```python
-threshold = 0.5          # Speech detection sensitivity
-min_speech = 250         # Minimum speech duration (ms)
-min_silence = 500        # Silence before speech_end (ms)
-interruption_threshold = 0.7  # Probability for interruption
-```
-
-### Whisper Settings (stt/whisper_transcriber.py)
-```python
-model = "small"          # 1.3GB VRAM
-device = "cuda"
-compute_type = "float16"
-beam_size = 5
-patience = 1.0
-```
-
-## Testing Commands
-
-```bash
-# Check all container health
-curl http://localhost:8001/health  # STT
-curl http://localhost:8765/health  # RVC
-curl http://localhost:8080/health  # LLM
-
-# Monitor logs
-docker logs -f miku-bot | grep -E "(listen|transcript|interrupt)"
-docker logs -f miku-stt
-docker logs -f miku-rvc-api | grep interrupt
-
-# Test interrupt endpoint
-curl -X POST http://localhost:8765/interrupt
-
-# Check GPU usage
-nvidia-smi
-```
-
-## Troubleshooting
-
-| Issue | Solution |
-|-------|----------|
-| No audio from Discord | Check bot has Connect and Speak permissions |
-| VAD not detecting | Speak louder, check microphone, lower threshold |
-| Empty transcripts | Speak for at least 1-2 seconds, check Whisper model |
-| Interruption not working | Verify `miku_speaking=true`, check VAD probability |
-| High latency | Profile each stage, check GPU utilization |
-
-## Next Features (Phase 4C+)
-
-- [ ] KV cache precomputation from partial transcripts
-- [ ] Multi-user simultaneous conversation
-- [ ] Latency optimization (<1s total)
-- [ ] Voice activity history and analytics
-- [ ] Emotion detection from speech patterns
-- [ ] Context-aware interruption handling
-
----
-
-**Ready to test!** Use `!miku join` → `!miku listen` → speak to Miku 🎤