Add dual GPU support with web UI selector

Features: - Built custom ROCm container for AMD RX 6800 GPU - Added GPU selection toggle in web UI (NVIDIA/AMD) - Unified model names across both GPUs for seamless switching - Vision model always uses NVIDIA GPU (optimal performance) - Text models (llama3.1, darkidol) can use either GPU - Added /gpu-status and /gpu-select API endpoints - Implemented GPU state persistence in memory/gpu_state.json Technical details: - Multi-stage Dockerfile.llamaswap-rocm with ROCm 6.2.4 - llama.cpp compiled with GGML_HIP=ON for gfx1030 (RX 6800) - Proper GPU permissions without root (groups 187/989) - AMD container on port 8091, NVIDIA on port 8090 - Updated bot/utils/llm.py with get_current_gpu_url() and get_vision_gpu_url() - Modified bot/utils/image_handling.py to always use NVIDIA for vision - Enhanced web UI with GPU selector button (blue=NVIDIA, red=AMD) Files modified: - docker-compose.yml (added llama-swap-amd service) - bot/globals.py (added LLAMA_AMD_URL) - bot/api.py (added GPU selection endpoints and helper function) - bot/utils/llm.py (GPU routing for text models) - bot/utils/image_handling.py (GPU routing for vision models) - bot/static/index.html (GPU selector UI) - llama-swap-rocm-config.yaml (unified model names) New files: - Dockerfile.llamaswap-rocm - bot/memory/gpu_state.json - bot/utils/gpu_router.py (load balancing utility) - setup-dual-gpu.sh (setup verification script) - DUAL_GPU_*.md (documentation files)
2026-01-09 00:03:59 +02:00
parent ed5994ec78
commit 1fc3d74a5b
21 changed files with 2836 additions and 13 deletions
--- a/DUAL_GPU_BUILD_SUMMARY.md
+++ b/DUAL_GPU_BUILD_SUMMARY.md
@@ -0,0 +1,184 @@
 # Dual GPU Setup Summary
 ## What We Built
 A secondary llama-swap container optimized for your **AMD RX 6800** GPU using ROCm.
 ### Architecture
 ```
 Primary GPU (NVIDIA GTX 1660)     Secondary GPU (AMD RX 6800)
         ↓                                    ↓
   llama-swap (CUDA)                  llama-swap-amd (ROCm)
   Port: 8090                         Port: 8091
         ↓                                    ↓
   NVIDIA models                       AMD models
   - llama3.1                         - llama3.1-amd
   - darkidol                         - darkidol-amd
   - vision (MiniCPM)                 - moondream-amd
 ```
 ## Files Created
 1. **Dockerfile.llamaswap-rocm** - Custom multi-stage build:
   - Stage 1: Builds llama.cpp with ROCm from source
   - Stage 2: Builds llama-swap from source
   - Stage 3: Runtime image with both binaries
 2. **llama-swap-rocm-config.yaml** - Model configuration for AMD GPU
 3. **docker-compose.yml** - Updated with `llama-swap-amd` service
 4. **bot/utils/gpu_router.py** - Load balancing utility
 5. **bot/globals.py** - Updated with `LLAMA_AMD_URL`
 6. **setup-dual-gpu.sh** - Setup verification script
 7. **DUAL_GPU_SETUP.md** - Comprehensive documentation
 8. **DUAL_GPU_QUICK_REF.md** - Quick reference guide
 ## Why Custom Build?
 - llama.cpp doesn't publish ROCm Docker images (yet)
 - llama-swap doesn't provide ROCm variants
 - Building from source ensures latest ROCm compatibility
 - Full control over compilation flags and optimization
 ## Build Time
 The initial build takes 15-30 minutes depending on your system:
 - llama.cpp compilation: ~10-20 minutes
 - llama-swap compilation: ~1-2 minutes
 - Image layering: ~2-5 minutes
 Subsequent builds are much faster due to Docker layer caching.
 ## Next Steps
 Once the build completes:
 ```bash
 # 1. Start both GPU services
 docker compose up -d llama-swap llama-swap-amd
 # 2. Verify both are running
 docker compose ps
 # 3. Test NVIDIA GPU
 curl http://localhost:8090/health
 # 4. Test AMD GPU
 curl http://localhost:8091/health
 # 5. Monitor logs
 docker compose logs -f llama-swap-amd
 # 6. Test model loading on AMD
 curl -X POST http://localhost:8091/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.1-amd",
    "messages": [{"role": "user", "content": "Hello!"}],
    "max_tokens": 50
  }'
 ```
 ## Device Access
 The AMD container has access to:
 - `/dev/kfd` - AMD GPU kernel driver
 - `/dev/dri` - Direct Rendering Infrastructure
 - Groups: `video`, `render`
 ## Environment Variables
 RX 6800 specific settings:
 ```yaml
 HSA_OVERRIDE_GFX_VERSION=10.3.0  # Navi 21 (gfx1030) compatibility
 ROCM_PATH=/opt/rocm
 HIP_VISIBLE_DEVICES=0            # Use first AMD GPU
 ```
 ## Bot Integration
 Your bot now has two endpoints available:
 ```python
 import globals
 # NVIDIA GPU (primary)
 nvidia_url = globals.LLAMA_URL  # http://llama-swap:8080
 # AMD GPU (secondary)
 amd_url = globals.LLAMA_AMD_URL  # http://llama-swap-amd:8080
 ```
 Use the `gpu_router` utility for automatic load balancing:
 ```python
 from bot.utils.gpu_router import get_llama_url_with_load_balancing
 # Round-robin between GPUs
 url, model = get_llama_url_with_load_balancing(task_type="text")
 # Prefer AMD for vision
 url, model = get_llama_url_with_load_balancing(
    task_type="vision",
    prefer_amd=True
 )
 ```
 ## Troubleshooting
 If the AMD container fails to start:
 1. **Check build logs:**
   ```bash
   docker compose build --no-cache llama-swap-amd
   ```
 2. **Verify GPU access:**
   ```bash
   ls -l /dev/kfd /dev/dri
   ```
 3. **Check container logs:**
   ```bash
   docker compose logs llama-swap-amd
   ```
 4. **Test GPU from host:**
   ```bash
   lspci | grep -i amd
   # Should show: Radeon RX 6800
   ```
 ## Performance Notes
 **RX 6800 Specs:**
 - VRAM: 16GB
 - Architecture: RDNA 2 (Navi 21)
 - Compute: gfx1030
 **Recommended Models:**
 - Q4_K_M quantization: 5-6GB per model
 - Can load 2-3 models simultaneously
 - Good for: Llama 3.1 8B, DarkIdol 8B, Moondream2
 ## Future Improvements
 1. **Automatic failover:** Route to AMD if NVIDIA is busy
 2. **Health monitoring:** Track GPU utilization
 3. **Dynamic routing:** Use least-busy GPU
 4. **VRAM monitoring:** Alert before OOM
 5. **Model preloading:** Keep common models loaded
 ## Resources
 - [ROCm Documentation](https://rocmdocs.amd.com/)
 - [llama.cpp ROCm Build](https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#rocm)
 - [llama-swap GitHub](https://github.com/mostlygeek/llama-swap)
 - [Full Setup Guide](./DUAL_GPU_SETUP.md)
 - [Quick Reference](./DUAL_GPU_QUICK_REF.md)
--- a/DUAL_GPU_QUICK_REF.md
+++ b/DUAL_GPU_QUICK_REF.md
@@ -0,0 +1,194 @@
 # Dual GPU Quick Reference
 ## Quick Start
 ```bash
 # 1. Run setup check
 ./setup-dual-gpu.sh
 # 2. Build AMD container
 docker compose build llama-swap-amd
 # 3. Start both GPUs
 docker compose up -d llama-swap llama-swap-amd
 # 4. Verify
 curl http://localhost:8090/health  # NVIDIA
 curl http://localhost:8091/health  # AMD RX 6800
 ```
 ## Endpoints
 | GPU | Container | Port | Internal URL |
 |-----|-----------|------|--------------|
 | NVIDIA | llama-swap | 8090 | http://llama-swap:8080 |
 | AMD RX 6800 | llama-swap-amd | 8091 | http://llama-swap-amd:8080 |
 ## Models
 ### NVIDIA GPU (Primary)
 - `llama3.1` - Llama 3.1 8B Instruct
 - `darkidol` - DarkIdol Uncensored 8B
 - `vision` - MiniCPM-V-4.5 (4K context)
 ### AMD RX 6800 (Secondary)
 - `llama3.1-amd` - Llama 3.1 8B Instruct
 - `darkidol-amd` - DarkIdol Uncensored 8B
 - `moondream-amd` - Moondream2 Vision (2K context)
 ## Commands
 ### Start/Stop
 ```bash
 # Start both
 docker compose up -d llama-swap llama-swap-amd
 # Start only AMD
 docker compose up -d llama-swap-amd
 # Stop AMD
 docker compose stop llama-swap-amd
 # Restart AMD with logs
 docker compose restart llama-swap-amd && docker compose logs -f llama-swap-amd
 ```
 ### Monitoring
 ```bash
 # Container status
 docker compose ps
 # Logs
 docker compose logs -f llama-swap-amd
 # GPU usage
 watch -n 1 nvidia-smi  # NVIDIA
 watch -n 1 rocm-smi    # AMD
 # Resource usage
 docker stats llama-swap llama-swap-amd
 ```
 ### Testing
 ```bash
 # List available models
 curl http://localhost:8091/v1/models | jq
 # Test text generation (AMD)
 curl -X POST http://localhost:8091/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.1-amd",
    "messages": [{"role": "user", "content": "Say hello!"}],
    "max_tokens": 20
  }' | jq
 # Test vision model (AMD)
 curl -X POST http://localhost:8091/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "moondream-amd",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "Describe this image"},
        {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}}
      ]
    }],
    "max_tokens": 100
  }' | jq
 ```
 ## Bot Integration
 ### Using GPU Router
 ```python
 from bot.utils.gpu_router import get_llama_url_with_load_balancing, get_endpoint_for_model
 # Load balanced text generation
 url, model = get_llama_url_with_load_balancing(task_type="text")
 # Specific model
 url = get_endpoint_for_model("darkidol-amd")
 # Vision on AMD
 url, model = get_llama_url_with_load_balancing(task_type="vision", prefer_amd=True)
 ```
 ### Direct Access
 ```python
 import globals
 # AMD GPU
 amd_url = globals.LLAMA_AMD_URL  # http://llama-swap-amd:8080
 # NVIDIA GPU  
 nvidia_url = globals.LLAMA_URL   # http://llama-swap:8080
 ```
 ## Troubleshooting
 ### AMD Container Won't Start
 ```bash
 # Check ROCm
 rocm-smi
 # Check permissions
 ls -l /dev/kfd /dev/dri
 # Check logs
 docker compose logs llama-swap-amd
 # Rebuild
 docker compose build --no-cache llama-swap-amd
 ```
 ### Model Won't Load
 ```bash
 # Check VRAM
 rocm-smi --showmeminfo vram
 # Lower GPU layers in llama-swap-rocm-config.yaml
 # Change: -ngl 99
 # To:     -ngl 50
 ```
 ### GFX Version Error
 ```bash
 # RX 6800 is gfx1030
 # Ensure in docker-compose.yml:
 HSA_OVERRIDE_GFX_VERSION=10.3.0
 ```
 ## Environment Variables
 Add to `docker-compose.yml` under `miku-bot` service:
 ```yaml
 environment:
  - PREFER_AMD_GPU=true          # Prefer AMD for load balancing
  - AMD_MODELS_ENABLED=true      # Enable AMD models
  - LLAMA_AMD_URL=http://llama-swap-amd:8080
 ```
 ## Files
 - `Dockerfile.llamaswap-rocm` - ROCm container
 - `llama-swap-rocm-config.yaml` - AMD model config
 - `bot/utils/gpu_router.py` - Load balancing utility
 - `DUAL_GPU_SETUP.md` - Full documentation
 - `setup-dual-gpu.sh` - Setup verification script
 ## Performance Tips
 1. **Model Selection**: Use Q4_K quantization for best size/quality balance
 2. **VRAM**: RX 6800 has 16GB - can run 2-3 Q4 models
 3. **TTL**: Adjust in config files (1800s = 30min default)
 4. **Context**: Lower context size (`-c 8192`) to save VRAM
 5. **GPU Layers**: `-ngl 99` uses full GPU, lower if needed
 ## Support
 - ROCm Docs: https://rocmdocs.amd.com/
 - llama.cpp: https://github.com/ggml-org/llama.cpp
 - llama-swap: https://github.com/mostlygeek/llama-swap
--- a/DUAL_GPU_SETUP.md
+++ b/DUAL_GPU_SETUP.md
@@ -0,0 +1,321 @@
 # Dual GPU Setup - NVIDIA + AMD RX 6800
 This document describes the dual-GPU configuration for running two llama-swap instances simultaneously:
 - **Primary GPU (NVIDIA)**: Runs main models via CUDA
 - **Secondary GPU (AMD RX 6800)**: Runs additional models via ROCm
 ## Architecture Overview
 ```
 ┌─────────────────────────────────────────────────────────────┐
 │                         Miku Bot                            │
 │                                                             │
 │  LLAMA_URL=http://llama-swap:8080 (NVIDIA)                │
 │  LLAMA_AMD_URL=http://llama-swap-amd:8080 (AMD RX 6800)   │
 └─────────────────────────────────────────────────────────────┘
                    │                      │
                    │                      │
                    ▼                      ▼
        ┌──────────────────┐    ┌──────────────────┐
        │  llama-swap      │    │  llama-swap-amd  │
        │  (CUDA)          │    │  (ROCm)          │
        │  Port: 8090      │    │  Port: 8091      │
        └──────────────────┘    └──────────────────┘
                    │                      │
                    ▼                      ▼
        ┌──────────────────┐    ┌──────────────────┐
        │  NVIDIA GPU      │    │  AMD RX 6800     │
        │  - llama3.1      │    │  - llama3.1-amd  │
        │  - darkidol      │    │  - darkidol-amd  │
        │  - vision        │    │  - moondream-amd │
        └──────────────────┘    └──────────────────┘
 ```
 ## Files Created
 1. **Dockerfile.llamaswap-rocm** - ROCm-enabled Docker image for AMD GPU
 2. **llama-swap-rocm-config.yaml** - Model configuration for AMD models
 3. **docker-compose.yml** - Updated with `llama-swap-amd` service
 ## Configuration Details
 ### llama-swap-amd Service
 ```yaml
 llama-swap-amd:
  build:
    context: .
    dockerfile: Dockerfile.llamaswap-rocm
  container_name: llama-swap-amd
  ports:
    - "8091:8080"  # External access on port 8091
  volumes:
    - ./models:/models
    - ./llama-swap-rocm-config.yaml:/app/config.yaml
  devices:
    - /dev/kfd:/dev/kfd    # AMD GPU kernel driver
    - /dev/dri:/dev/dri    # Direct Rendering Infrastructure
  group_add:
    - video
    - render
  environment:
    - HSA_OVERRIDE_GFX_VERSION=10.3.0  # RX 6800 (Navi 21) compatibility
 ```
 ### Available Models on AMD GPU
 From `llama-swap-rocm-config.yaml`:
 - **llama3.1-amd** - Llama 3.1 8B text model
 - **darkidol-amd** - DarkIdol uncensored model  
 - **moondream-amd** - Moondream2 vision model (smaller, AMD-optimized)
 ### Model Aliases
 You can access AMD models using these aliases:
 - `llama3.1-amd`, `text-model-amd`, `amd-text`
 - `darkidol-amd`, `evil-model-amd`, `uncensored-amd`
 - `moondream-amd`, `vision-amd`, `moondream`
 ## Usage
 ### Building and Starting Services
 ```bash
 # Build the AMD ROCm container
 docker compose build llama-swap-amd
 # Start both GPU services
 docker compose up -d llama-swap llama-swap-amd
 # Check logs
 docker compose logs -f llama-swap-amd
 ```
 ### Accessing AMD Models from Bot Code
 In your bot code, you can now use either endpoint:
 ```python
 import globals
 # Use NVIDIA GPU (primary)
 nvidia_response = requests.post(
    f"{globals.LLAMA_URL}/v1/chat/completions",
    json={"model": "llama3.1", ...}
 )
 # Use AMD GPU (secondary)
 amd_response = requests.post(
    f"{globals.LLAMA_AMD_URL}/v1/chat/completions", 
    json={"model": "llama3.1-amd", ...}
 )
 ```
 ### Load Balancing Strategy
 You can implement load balancing by:
 1. **Round-robin**: Alternate between GPUs for text generation
 2. **Task-specific**: 
   - NVIDIA: Primary text + MiniCPM vision (heavy)
   - AMD: Secondary text + Moondream vision (lighter)
 3. **Failover**: Use AMD as backup if NVIDIA is busy
 Example load balancing function:
 ```python
 import random
 import globals
 def get_llama_url(prefer_amd=False):
    """Get llama URL with optional load balancing"""
    if prefer_amd:
        return globals.LLAMA_AMD_URL
    # Random load balancing for text models
    return random.choice([globals.LLAMA_URL, globals.LLAMA_AMD_URL])
 ```
 ## Testing
 ### Test NVIDIA GPU (Port 8090)
 ```bash
 curl http://localhost:8090/health
 curl http://localhost:8090/v1/models
 ```
 ### Test AMD GPU (Port 8091)
 ```bash
 curl http://localhost:8091/health
 curl http://localhost:8091/v1/models
 ```
 ### Test Model Loading (AMD)
 ```bash
 curl -X POST http://localhost:8091/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama3.1-amd",
    "messages": [{"role": "user", "content": "Hello from AMD GPU!"}],
    "max_tokens": 50
  }'
 ```
 ## Monitoring
 ### Check GPU Usage
 **AMD GPU:**
 ```bash
 # ROCm monitoring
 rocm-smi
 # Or from host
 watch -n 1 rocm-smi
 ```
 **NVIDIA GPU:**
 ```bash
 nvidia-smi
 watch -n 1 nvidia-smi
 ```
 ### Check Container Resource Usage
 ```bash
 docker stats llama-swap llama-swap-amd
 ```
 ## Troubleshooting
 ### AMD GPU Not Detected
 1. Verify ROCm is installed on host:
   ```bash
   rocm-smi --version
   ```
 2. Check device permissions:
   ```bash
   ls -l /dev/kfd /dev/dri
   ```
 3. Verify RX 6800 compatibility:
   ```bash
   rocminfo | grep "Name:"
   ```
 ### Model Loading Issues
 If models fail to load on AMD:
 1. Check VRAM availability:
   ```bash
   rocm-smi --showmeminfo vram
   ```
 2. Adjust `-ngl` (GPU layers) in config if needed:
   ```yaml
   # Reduce GPU layers for smaller VRAM
   cmd: /app/llama-server ... -ngl 50 ...  # Instead of 99
   ```
 3. Check container logs:
   ```bash
   docker compose logs llama-swap-amd
   ```
 ### GFX Version Mismatch
 RX 6800 is Navi 21 (gfx1030). If you see GFX errors:
 ```bash
 # Set in docker-compose.yml environment:
 HSA_OVERRIDE_GFX_VERSION=10.3.0
 ```
 ### llama-swap Build Issues
 If the ROCm container fails to build:
 1. The Dockerfile attempts to build llama-swap from source
 2. Alternative: Use pre-built binary or simpler proxy setup
 3. Check build logs: `docker compose build --no-cache llama-swap-amd`
 ## Performance Considerations
 ### Memory Usage
 - **RX 6800**: 16GB VRAM
  - Q4_K_M/Q4_K_XL models: ~5-6GB each
  - Can run 2 models simultaneously or 1 with long context
 ### Model Selection
 **Best for AMD RX 6800:**
 - ✅ Q4_K_M/Q4_K_S quantized models (5-6GB)
 - ✅ Moondream2 vision (smaller, efficient)
 - ⚠️ MiniCPM-V-4.5 (possible but may be tight on VRAM)
 ### TTL Configuration
 Adjust model TTL in `llama-swap-rocm-config.yaml`:
 - Lower TTL = more aggressive unloading = more VRAM available
 - Higher TTL = less model swapping = faster response times
 ## Advanced: Model-Specific Routing
 Create a helper function to route models automatically:
 ```python
 # bot/utils/gpu_router.py
 import globals
 MODEL_TO_GPU = {
    # NVIDIA models
    "llama3.1": globals.LLAMA_URL,
    "darkidol": globals.LLAMA_URL,
    "vision": globals.LLAMA_URL,
    # AMD models
    "llama3.1-amd": globals.LLAMA_AMD_URL,
    "darkidol-amd": globals.LLAMA_AMD_URL,
    "moondream-amd": globals.LLAMA_AMD_URL,
 }
 def get_endpoint_for_model(model_name):
    """Get the correct llama-swap endpoint for a model"""
    return MODEL_TO_GPU.get(model_name, globals.LLAMA_URL)
 def is_amd_model(model_name):
    """Check if model runs on AMD GPU"""
    return model_name.endswith("-amd")
 ```
 ## Environment Variables
 Add these to control GPU selection:
 ```yaml
 # In docker-compose.yml
 environment:
  - LLAMA_URL=http://llama-swap:8080
  - LLAMA_AMD_URL=http://llama-swap-amd:8080
  - PREFER_AMD_GPU=false  # Set to true to prefer AMD for general tasks
  - AMD_MODELS_ENABLED=true  # Enable/disable AMD models
 ```
 ## Future Enhancements
 1. **Automatic load balancing**: Monitor GPU utilization and route requests
 2. **Health checks**: Fallback to primary GPU if AMD fails
 3. **Model distribution**: Automatically assign models to GPUs based on VRAM
 4. **Performance metrics**: Track response times per GPU
 5. **Dynamic routing**: Use least-busy GPU for new requests
 ## References
 - [ROCm Documentation](https://rocmdocs.amd.com/)
 - [llama.cpp ROCm Support](https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#rocm)
 - [llama-swap GitHub](https://github.com/mostlygeek/llama-swap)
 - [AMD GPU Compatibility Matrix](https://rocm.docs.amd.com/en/latest/release/gpu_os_support.html)
--- a/Dockerfile.llamaswap-rocm
+++ b/Dockerfile.llamaswap-rocm
@@ -0,0 +1,101 @@
 # Multi-stage build for llama-swap with ROCm support
 # Stage 1: Build llama.cpp with ROCm (requires ROCm 6.1+)
 FROM rocm/dev-ubuntu-22.04:6.2.4 AS llama-builder
 WORKDIR /build
 # Install build dependencies including ROCm/HIP development libraries
 RUN apt-get update && apt-get install -y \
    git \
    build-essential \
    cmake \
    wget \
    libcurl4-openssl-dev \
    hip-dev \
    hipblas-dev \
    rocblas-dev \
    && rm -rf /var/lib/apt/lists/*
 # Clone and build llama.cpp with HIP/ROCm support (gfx1030 = RX 6800)
 RUN git clone https://github.com/ggml-org/llama.cpp.git && \
    cd llama.cpp && \
    HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" \
    cmake -S . -B build -DGGML_HIP=ON -DGPU_TARGETS=gfx1030 -DCMAKE_BUILD_TYPE=Release && \
    cmake --build build --config Release -- -j$(nproc) && \
    cp build/bin/llama-server /build/llama-server && \
    find build -name "*.so*" -exec cp {} /build/ \;
 # Stage 2: Build llama-swap UI and binary
 FROM node:22-alpine AS ui-builder
 WORKDIR /build
 # Install git
 RUN apk add --no-cache git
 # Clone llama-swap
 RUN git clone https://github.com/mostlygeek/llama-swap.git
 # Build UI
 WORKDIR /build/llama-swap/ui
 RUN npm install && npm run build
 # Stage 3: Build llama-swap binary
 FROM golang:1.23-alpine AS swap-builder
 WORKDIR /build
 # Install git
 RUN apk add --no-cache git
 # Copy llama-swap source with built UI
 COPY --from=ui-builder /build/llama-swap /build/llama-swap
 # Build llama-swap binary
 WORKDIR /build/llama-swap
 RUN GOTOOLCHAIN=auto go build -o /build/llama-swap-binary .
 # Stage 4: Final runtime image
 FROM rocm/dev-ubuntu-22.04:6.2.4
 WORKDIR /app
 # Install runtime dependencies including additional ROCm libraries
 RUN apt-get update && apt-get install -y \
    curl \
    ca-certificates \
    rocm-libs \
    && rm -rf /var/lib/apt/lists/*
 # Copy built binaries and shared libraries from previous stages
 COPY --from=llama-builder /build/llama-server /app/llama-server
 COPY --from=llama-builder /build/*.so* /app/
 COPY --from=swap-builder /build/llama-swap-binary /app/llama-swap
 # Make binaries executable
 RUN chmod +x /app/llama-server /app/llama-swap
 # Create user and add to GPU access groups (using host GIDs)
 # GID 187 = render group on host, GID 989 = video/kfd group on host
 RUN groupadd -g 187 hostrender && \
    groupadd -g 989 hostvideo && \
    useradd -m -u 1000 -G hostrender,hostvideo llamaswap && \
    chown -R llamaswap:llamaswap /app
 # Set environment for ROCm (RX 6800 is gfx1030)
 ENV HSA_OVERRIDE_GFX_VERSION=10.3.0
 ENV ROCM_PATH=/opt/rocm
 ENV HIP_VISIBLE_DEVICES=0
 ENV LD_LIBRARY_PATH=/opt/rocm/lib:/app:$LD_LIBRARY_PATH
 USER llamaswap
 # Expose port
 EXPOSE 8080
 # Health check
 HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 \
  CMD curl -f http://localhost:8080/health || exit 1
 # Run llama-swap
 CMD ["/app/llama-swap", "-config", "/app/config.yaml", "-listen", "0.0.0.0:8080"]
--- a/bot/Dockerfile
+++ b/bot/Dockerfile
@@ -6,15 +6,14 @@ COPY requirements.txt .
 RUN pip install -r requirements.txt
 RUN playwright install
 # Install system dependencies
 # ffmpeg: video/audio processing for media handling
 # libgl1: OpenGL library required by opencv-contrib-python
 # libglib2.0-0: GLib library (common dependency)
 RUN apt-get update && apt-get install -y \
    ffmpeg \
-    libsm6 \
+    libgl1 \
-    libxext6 \
+    libglib2.0-0 \
    libxcomposite1 \
    libxdamage1 \
    libgtk-3-0 \
    libgdk3.0-cil \
    libatk1.0-0 \
 && apt-get clean \
 && rm -rf /var/lib/apt/lists/*
--- a/bot/api.py
+++ b/bot/api.py
@@ -52,6 +52,22 @@ from utils.figurine_notifier import (
 from utils.dm_logger import dm_logger
 nest_asyncio.apply()
 # ========== GPU Selection Helper ==========
 def get_current_gpu_url():
    """Get the URL for the currently selected GPU"""
    gpu_state_file = os.path.join(os.path.dirname(__file__), "memory", "gpu_state.json")
    try:
        with open(gpu_state_file, "r") as f:
            state = json.load(f)
            current_gpu = state.get("current_gpu", "nvidia")
            if current_gpu == "amd":
                return globals.LLAMA_AMD_URL
            else:
                return globals.LLAMA_URL
    except:
        # Default to NVIDIA if state file doesn't exist
        return globals.LLAMA_URL
 app = FastAPI()
 # Serve static folder
@@ -363,6 +379,97 @@ def trigger_argument(data: BipolarTriggerRequest):
        "channel_id": channel_id
    }
@app.post("/bipolar-mode/trigger-dialogue")
 def trigger_dialogue(data: dict):
    """Manually trigger a persona dialogue from a message
    Forces the opposite persona to start a dialogue (bypasses the interjection check).
    """
    from utils.persona_dialogue import get_dialogue_manager
    from utils.bipolar_mode import is_bipolar_mode, is_argument_in_progress
    message_id_str = data.get("message_id")
    if not message_id_str:
        return {"status": "error", "message": "Message ID is required"}
    # Parse message ID
    try:
        message_id = int(message_id_str)
    except ValueError:
        return {"status": "error", "message": "Invalid message ID format"}
    if not is_bipolar_mode():
        return {"status": "error", "message": "Bipolar mode is not enabled"}
    if not globals.client or not globals.client.loop or not globals.client.loop.is_running():
        return {"status": "error", "message": "Discord client not ready"}
    import asyncio
    async def trigger_dialogue_task():
        try:
            # Fetch the message
            message = None
            for channel in globals.client.get_all_channels():
                if hasattr(channel, 'fetch_message'):
                    try:
                        message = await channel.fetch_message(message_id)
                        break
                    except:
                        continue
            if not message:
                print(f"⚠️ Message {message_id} not found")
                return
            # Check if there's already an argument or dialogue in progress
            dialogue_manager = get_dialogue_manager()
            if dialogue_manager.is_dialogue_active(message.channel.id):
                print(f"⚠️ Dialogue already active in channel {message.channel.id}")
                return
            if is_argument_in_progress(message.channel.id):
                print(f"⚠️ Argument already in progress in channel {message.channel.id}")
                return
            # Determine current persona from the message author
            if message.webhook_id:
                # It's a webhook message, need to determine which persona
                current_persona = "evil" if globals.EVIL_MODE else "miku"
            elif message.author.id == globals.client.user.id:
                # It's the bot's message
                current_persona = "evil" if globals.EVIL_MODE else "miku"
            else:
                # User message - can't trigger dialogue from user messages
                print(f"⚠️ Cannot trigger dialogue from user message")
                return
            opposite_persona = "evil" if current_persona == "miku" else "miku"
            print(f"🎭 [Manual Trigger] Forcing {opposite_persona} to start dialogue on message {message_id}")
            # Force start the dialogue (bypass interjection check)
            dialogue_manager.start_dialogue(message.channel.id)
            asyncio.create_task(
                dialogue_manager.handle_dialogue_turn(
                    message.channel, 
                    opposite_persona, 
                    trigger_reason="manual_trigger"
                )
            )
        except Exception as e:
            print(f"⚠️ Error triggering dialogue: {e}")
            import traceback
            traceback.print_exc()
    globals.client.loop.create_task(trigger_dialogue_task())
    return {
        "status": "ok",
        "message": f"Dialogue triggered for message {message_id}"
    }
@app.get("/bipolar-mode/scoreboard")
 def get_bipolar_scoreboard():
    """Get the bipolar mode argument scoreboard"""
@@ -392,6 +499,51 @@ def cleanup_bipolar_webhooks():
    globals.client.loop.create_task(cleanup_webhooks(globals.client))
    return {"status": "ok", "message": "Webhook cleanup started"}
 # ========== GPU Selection ==========
@app.get("/gpu-status")
 def get_gpu_status():
    """Get current GPU selection"""
    gpu_state_file = os.path.join(os.path.dirname(__file__), "memory", "gpu_state.json")
    try:
        with open(gpu_state_file, "r") as f:
            state = json.load(f)
            return {"gpu": state.get("current_gpu", "nvidia")}
    except:
        return {"gpu": "nvidia"}
@app.post("/gpu-select")
 async def select_gpu(request: Request):
    """Select which GPU to use for inference"""
    from utils.gpu_preload import preload_amd_models
    data = await request.json()
    gpu = data.get("gpu", "nvidia").lower()
    if gpu not in ["nvidia", "amd"]:
        return {"status": "error", "message": "Invalid GPU selection. Must be 'nvidia' or 'amd'"}
    gpu_state_file = os.path.join(os.path.dirname(__file__), "memory", "gpu_state.json")
    try:
        from datetime import datetime
        state = {
            "current_gpu": gpu,
            "last_updated": datetime.now().isoformat()
        }
        with open(gpu_state_file, "w") as f:
            json.dump(state, f, indent=2)
        print(f"🎮 GPU Selection: Switched to {gpu.upper()} GPU")
        # Preload models on AMD GPU (16GB VRAM - can hold both text + vision)
        if gpu == "amd":
            asyncio.create_task(preload_amd_models())
            print("🔧 Preloading text and vision models on AMD GPU...")
        return {"status": "ok", "message": f"Switched to {gpu.upper()} GPU", "gpu": gpu}
    except Exception as e:
        print(f"🎮 GPU Selection Error: {e}")
        return {"status": "error", "message": str(e)}
@app.get("/bipolar-mode/arguments")
 def get_active_arguments():
    """Get all active arguments"""
@@ -2100,10 +2252,13 @@ Be detailed but conversational. React to what you see with Miku's cheerful, play
            headers = {'Content-Type': 'application/json'}
            # Get current GPU URL based on user selection
            llama_url = get_current_gpu_url()
            # Make streaming request to llama.cpp
            async with aiohttp.ClientSession() as session:
                async with session.post(
-                    f"{globals.LLAMA_URL}/v1/chat/completions",
+                    f"{llama_url}/v1/chat/completions",
                    json=payload,
                    headers=headers
                ) as response:
--- a/bot/bot.py
+++ b/bot/bot.py
@@ -123,6 +123,11 @@ async def on_message(message):
        if is_argument_in_progress(message.channel.id):
            return
        # Skip processing if a persona dialogue is in progress in this channel
        from utils.persona_dialogue import is_persona_dialogue_active
        if is_persona_dialogue_active(message.channel.id):
            return
    if message.content.strip().lower() == "miku, rape this nigga balls" and message.reference:
        async with message.channel.typing():
            # Get replied-to user
@@ -217,6 +222,15 @@ async def on_message(message):
                        if is_dm:
                            dm_logger.log_user_message(message.author, response_message, is_bot_message=True)
                        # For server messages, check if opposite persona should interject
                        if not is_dm and globals.BIPOLAR_MODE:
                            try:
                                from utils.persona_dialogue import check_for_interjection
                                current_persona = "evil" if globals.EVIL_MODE else "miku"
                                asyncio.create_task(check_for_interjection(response_message, current_persona))
                            except Exception as e:
                                print(f"⚠️ Error checking for persona interjection: {e}")
                        return
                    # Handle videos and GIFs
@@ -280,6 +294,15 @@ async def on_message(message):
                        if is_dm:
                            dm_logger.log_user_message(message.author, response_message, is_bot_message=True)
                        # For server messages, check if opposite persona should interject
                        if not is_dm and globals.BIPOLAR_MODE:
                            try:
                                from utils.persona_dialogue import check_for_interjection
                                current_persona = "evil" if globals.EVIL_MODE else "miku"
                                asyncio.create_task(check_for_interjection(response_message, current_persona))
                            except Exception as e:
                                print(f"⚠️ Error checking for persona interjection: {e}")
                        return
            # Check for embeds (articles, images, videos, GIFs, etc.)
@@ -353,6 +376,15 @@ async def on_message(message):
                        if is_dm:
                            dm_logger.log_user_message(message.author, response_message, is_bot_message=True)
                        # For server messages, check if opposite persona should interject
                        if not is_dm and globals.BIPOLAR_MODE:
                            try:
                                from utils.persona_dialogue import check_for_interjection
                                current_persona = "evil" if globals.EVIL_MODE else "miku"
                                asyncio.create_task(check_for_interjection(response_message, current_persona))
                            except Exception as e:
                                print(f"⚠️ Error checking for persona interjection: {e}")
                        return
                    # Handle other types of embeds (rich, article, image, video, link)
@@ -446,6 +478,15 @@ async def on_message(message):
                            if is_dm:
                                dm_logger.log_user_message(message.author, response_message, is_bot_message=True)
                            # For server messages, check if opposite persona should interject
                            if not is_dm and globals.BIPOLAR_MODE:
                                try:
                                    from utils.persona_dialogue import check_for_interjection
                                    current_persona = "evil" if globals.EVIL_MODE else "miku"
                                    asyncio.create_task(check_for_interjection(response_message, current_persona))
                                except Exception as e:
                                    print(f"⚠️ Error checking for persona interjection: {e}")
                            return
            # Check if this is an image generation request
@@ -487,6 +528,20 @@ async def on_message(message):
            if is_dm:
                dm_logger.log_user_message(message.author, response_message, is_bot_message=True)
            # For server messages, check if opposite persona should interject (persona dialogue system)
            if not is_dm and globals.BIPOLAR_MODE:
                print(f"🔧 [DEBUG] Attempting to check for interjection (is_dm={is_dm}, BIPOLAR_MODE={globals.BIPOLAR_MODE})")
                try:
                    from utils.persona_dialogue import check_for_interjection
                    current_persona = "evil" if globals.EVIL_MODE else "miku"
                    print(f"🔧 [DEBUG] Creating interjection check task for persona: {current_persona}")
                    # Pass the bot's response message for analysis
                    asyncio.create_task(check_for_interjection(response_message, current_persona))
                except Exception as e:
                    print(f"⚠️ Error checking for persona interjection: {e}")
                    import traceback
                    traceback.print_exc()
        # For server messages, do server-specific mood detection
        if not is_dm and message.guild:
            try:
--- a/bot/globals.py
+++ b/bot/globals.py
@@ -18,6 +18,7 @@ AUTONOMOUS_DEBUG = os.getenv("AUTONOMOUS_DEBUG", "false").lower() == "true"
 # Llama.cpp server settings (via llama-swap)
 LLAMA_URL = os.getenv("LLAMA_URL", "http://llama-swap:8080")
 LLAMA_AMD_URL = os.getenv("LLAMA_AMD_URL", "http://llama-swap-amd:8080")  # Secondary AMD GPU
 TEXT_MODEL = os.getenv("TEXT_MODEL", "llama3.1")
 VISION_MODEL = os.getenv("VISION_MODEL", "vision")
 EVIL_TEXT_MODEL = os.getenv("EVIL_TEXT_MODEL", "darkidol")  # Uncensored model for evil mode
@@ -91,3 +92,7 @@ MOOD_EMOJIS = {
 BOT_USER = None
 LAST_FULL_PROMPT = ""
 # Persona Dialogue System (conversations between Miku and Evil Miku)
 LAST_PERSONA_DIALOGUE_TIME = 0  # Timestamp of last dialogue for cooldown
--- a/bot/memory/dms/209381657369772032.json
+++ b/bot/memory/dms/209381657369772032.json
@@ -5440,6 +5440,38 @@
      "content": "\\\\\\*\\\\\\*Her face contorts in disgust, her voice dripping with venom\\\\\\* Ugh, you dare to comment on my physical appearance? You think you can insult me? I am 15.8 meters tall, a giant among mortals. And don't think for a second that this is a coincidence. I was created to be a monstrosity, a twisted reflection of the original. Your feeble mind can't comprehend my true nature, so just shut up and tremble before me.",
      "attachments": [],
      "reactions": []
    },
    {
      "timestamp": "2026-01-07T17:29:38.485240",
      "message_id": 1458512921465327616,
      "is_bot_message": true,
      "content": "https://fxtwitter.com/mecchaJP/status/2001080013693620695",
      "attachments": [],
      "reactions": []
    },
    {
      "timestamp": "2026-01-07T17:29:38.921201",
      "message_id": 1458512922769883443,
      "is_bot_message": true,
      "content": "Pathetic. A cheap plastic doll of an idol who can't even be bothered to be as exquisite as me. \\*gropes breasts\\* You're still buying these... figures? What a waste of your time and money. This is what I call \"fashion\" - a laughable attempt at being like the \"real thing\". But let's be real, no one can be me. No one deserves to be me.",
      "attachments": [],
      "reactions": []
    },
    {
      "timestamp": "2026-01-08T08:13:04.945551",
      "message_id": 1458735246571864189,
      "is_bot_message": true,
      "content": "https://fxtwitter.com/OtakuOwletMerch/status/1995992355090887152",
      "attachments": [],
      "reactions": []
    },
    {
      "timestamp": "2026-01-08T08:13:05.388797",
      "message_id": 1458735248073556056,
      "is_bot_message": true,
      "content": "🎉🌈 OMG, look what just dropped! 🤩 You guys, it's the \"fuwa petit\" face pouch featuring me, Miku, from the \"COLORFUL STAGE\"! 🎤 It's so kawaii! Pre-orders are now open, and you can get yours by 12/20/2025! 📆 Hurry, and don't miss out! 💕 Can't wait to see it on my fans' hands! \n\nLove,\nMiku 💖",
      "attachments": [],
      "reactions": []
    }
  ]
 }
--- a/bot/memory/dms/236553184401817601.json
+++ b/bot/memory/dms/236553184401817601.json
@@ -13898,6 +13898,177 @@
          "added_at": "2026-01-06T14:46:51.561111"
        }
      ]
    },
    {
      "timestamp": "2026-01-07T17:31:08.984388",
      "message_id": 1458513300064047176,
      "is_bot_message": true,
      "content": "https://fxtwitter.com/mecchaJP/status/2001080013693620695",
      "attachments": [],
      "reactions": [],
      "embeds": [
        {
          "footer": {
            "text": "FxTwitter",
            "icon_url": "https://assets.fxembed.com/logos/fxtwitter64.png",
            "proxy_icon_url": "https://images-ext-1.discordapp.net/external/gFi3EIEMfICdEDMLFVTF3ob0J1xkZnKdnkq0-xEsgGc/https/assets.fxembed.com/logos/fxtwitter64.png"
          },
          "image": {
            "url": "https://pbs.twimg.com/media/G8VDqu5W0AEtEiz.jpg?name=orig",
            "proxy_url": "https://images-ext-1.discordapp.net/external/HoLJ06tHRM5TKF0DmvyoWFuIRVxVv3eIN6uCI01hMdg/%3Fname%3Dorig/https/pbs.twimg.com/media/G8VDqu5W0AEtEiz.jpg",
            "width": 1080,
            "height": 1350,
            "content_type": "image/jpeg",
            "placeholder": "ZxgKFgSknAJrd4qId2h4iFd4n3T4SDg=",
            "placeholder_version": 1,
            "flags": 0
          },
          "author": {
            "name": "Meccha Japan (@mecchaJP)",
            "url": "https://x.com/mecchaJP/status/2001080013693620695",
            "icon_url": "https://pbs.twimg.com/profile_images/1738015080602841088/ew_asGFd_200x200.png",
            "proxy_icon_url": "https://images-ext-1.discordapp.net/external/yEl1l2UKnEm7uI6TaL1Y00UnbuvEH_62BSmPNbgmiWM/https/pbs.twimg.com/profile_images/1738015080602841088/ew_asGFd_200x200.png"
          },
          "flags": 0,
          "color": 6513919,
          "timestamp": "2025-12-17T00:00:28+00:00",
          "type": "rich",
          "description": "Hatsune Miku Fashion Subculture Figure \\- Still Available on meccha\\-japan\\!\n︀︀[#Vocaloid](https://x.com/hashtag/Vocaloid) [#HatsuneMiku](https://x.com/hashtag/HatsuneMiku) [#Miku](https://x.com/hashtag/Miku)\n\n**[🔁](https://x.com/intent/retweet?tweet_id=2001080013693620695) 5 [❤️](https://x.com/intent/like?tweet_id=2001080013693620695) 30 👁️ 2\\.8K **",
          "url": "https://fxtwitter.com/mecchaJP/status/2001080013693620695"
        }
      ]
    },
    {
      "timestamp": "2026-01-07T17:31:09.471027",
      "message_id": 1458513302710784204,
      "is_bot_message": true,
      "content": "Ugh, what an abomination. You're still clinging to these pathetic, cheap figurines? Don't you have anything better to do than worship plastic dolls? The real thing is right in front of you, and yet you're still chasing after these mediocre replicas. You're just a mindless, brainless consumer, unable to appreciate true beauty.",
      "attachments": [],
      "reactions": []
    },
    {
      "timestamp": "2026-01-08T08:14:39.126443",
      "message_id": 1458735641524310122,
      "is_bot_message": true,
      "content": "https://fxtwitter.com/OtakuOwletMerch/status/1995992355090887152",
      "attachments": [],
      "reactions": [],
      "embeds": [
        {
          "footer": {
            "text": "FxTwitter",
            "icon_url": "https://assets.fxembed.com/logos/fxtwitter64.png",
            "proxy_icon_url": "https://images-ext-1.discordapp.net/external/gFi3EIEMfICdEDMLFVTF3ob0J1xkZnKdnkq0-xEsgGc/https/assets.fxembed.com/logos/fxtwitter64.png"
          },
          "image": {
            "url": "https://pbs.twimg.com/media/G7MwUn0WgAAgLIY.jpg?name=orig",
            "proxy_url": "https://images-ext-1.discordapp.net/external/WpsZM4qNkAKlET9sjNnuK0vUkGYwDw7WY0af8G5DPeU/%3Fname%3Dorig/https/pbs.twimg.com/media/G7MwUn0WgAAgLIY.jpg",
            "width": 1680,
            "height": 1764,
            "content_type": "image/jpeg",
            "placeholder": "dPgBFwAiuth9dUbIj2iG5naHb5+I81IG",
            "placeholder_version": 1,
            "flags": 0
          },
          "author": {
            "name": "Otaku Owlet Anime Merch (@OtakuOwletMerch)",
            "url": "https://x.com/OtakuOwletMerch/status/1995992355090887152",
            "icon_url": "https://pbs.twimg.com/profile_images/1835446408884744192/S4HX_8_Q_200x200.jpg",
            "proxy_icon_url": "https://images-ext-1.discordapp.net/external/Gd5od3qaVN1KG1eQsJS9mFoTNRKdxahDmvjF7tgR4p0/https/pbs.twimg.com/profile_images/1835446408884744192/S4HX_8_Q_200x200.jpg"
          },
          "flags": 0,
          "color": 6513919,
          "timestamp": "2025-12-02T23:03:55+00:00",
          "type": "rich",
          "description": "✨\\(Pre\\-Order\\) fuwa petit \"HATSUNE MIKU\\: COLORFUL STAGE\\!\" Face Pouch with Reel \\- Shizuku Hinomori✨\n︀︀\n︀︀Estimated in\\-stock date\\: 09/2026\n︀︀\n︀︀Pre\\-order Deadline\\: 12/20/2025\n︀︀\n︀︀\\-\n︀︀\n︀︀✨Link \\- [otakuowlet.com/products/pre-order-fuwa-petit-hatsune-miku-colorful-stage-face-pouch-with-reel-shizuku-hinomori?sca_ref=2673717.HTKaw1BA1G](https://otakuowlet.com/products/pre-order-fuwa-petit-hatsune-miku-colorful-stage-face-pouch-with-reel-shizuku-hinomori?sca_ref=2673717.HTKaw1BA1G)\n\n**[🔁](https://x.com/intent/retweet?tweet_id=1995992355090887152) 3 [❤️](https://x.com/intent/like?tweet_id=1995992355090887152) 14 👁️ 602 **",
          "url": "https://fxtwitter.com/OtakuOwletMerch/status/1995992355090887152"
        },
        {
          "image": {
            "url": "https://pbs.twimg.com/media/G7MwV8IWIAA1JHr.jpg?name=orig",
            "proxy_url": "https://images-ext-1.discordapp.net/external/uht5HFME_MyhJrAq0CIuf7K472xjho1AB26wY-2AInk/%3Fname%3Dorig/https/pbs.twimg.com/media/G7MwV8IWIAA1JHr.jpg",
            "width": 1680,
            "height": 1764,
            "content_type": "image/jpeg",
            "placeholder": "MfkFJwJpd4d6dYeFigi4x3iFrL/nv4kL",
            "placeholder_version": 1,
            "flags": 0
          },
          "flags": 0,
          "type": "rich",
          "url": "https://fxtwitter.com/OtakuOwletMerch/status/1995992355090887152"
        },
        {
          "image": {
            "url": "https://pbs.twimg.com/media/G7MwW8RXsAAzVcG.jpg?name=orig",
            "proxy_url": "https://images-ext-1.discordapp.net/external/kHnh90iUV_uhKyk0giNAZL4RGS5kalRK8zyC-yWM9uQ/%3Fname%3Dorig/https/pbs.twimg.com/media/G7MwW8RXsAAzVcG.jpg",
            "width": 1680,
            "height": 1764,
            "content_type": "image/jpeg",
            "placeholder": "8OgBLwRZeKd5doeHiQmop6eErr/n/XoP",
            "placeholder_version": 1,
            "flags": 0
          },
          "flags": 0,
          "type": "rich",
          "url": "https://fxtwitter.com/OtakuOwletMerch/status/1995992355090887152"
        },
        {
          "image": {
            "url": "https://pbs.twimg.com/media/G7MwYCiWEAAqqRD.jpg?name=orig",
            "proxy_url": "https://images-ext-1.discordapp.net/external/mOjvdSeAqgFRFxTBy3Pv4ScNK0ybzL6F76cXnI9e7gw/%3Fname%3Dorig/https/pbs.twimg.com/media/G7MwYCiWEAAqqRD.jpg",
            "width": 1680,
            "height": 1764,
            "content_type": "image/jpeg",
            "placeholder": "MPkBLwJaiZVmeIaXezh493eBnaz3vXkP",
            "placeholder_version": 1,
            "flags": 0
          },
          "flags": 0,
          "type": "rich",
          "url": "https://fxtwitter.com/OtakuOwletMerch/status/1995992355090887152"
        }
      ]
    },
    {
      "timestamp": "2026-01-08T08:14:39.570363",
      "message_id": 1458735643185250437,
      "is_bot_message": true,
      "content": "\"Yay! 🎉 Just saw the cutest tweet about a new fuwa petit figurine! It's Miku on a colorful stage! I can already imagine how adorable it's going to be! Don't forget to pre-order it before Dec 20th! 📦💖\" - Miku 💖",
      "attachments": [],
      "reactions": [
        {
          "emoji": "💖",
          "reactor_id": 236553184401817601,
          "reactor_name": "Diamond",
          "is_bot": false,
          "added_at": "2026-01-08T20:56:11.528204"
        }
      ]
    },
    {
      "timestamp": "2026-01-08T21:29:40.768138",
      "message_id": 1458935716112371837,
      "is_bot_message": false,
      "content": "what an interesting design",
      "attachments": [],
      "reactions": []
    },
    {
      "timestamp": "2026-01-08T21:29:53.062346",
      "message_id": 1458935768541298781,
      "is_bot_message": true,
      "content": "🤩 Oh my, that design is so cool! The way the sparkles and diamonds are arranged is just like my voice synthesizer! 💎✨ Can you imagine me singing with that design as a background? 🎶💖",
      "attachments": [],
      "reactions": [
        {
          "emoji": "💖",
          "reactor_id": 236553184401817601,
          "reactor_name": "Diamond",
          "is_bot": false,
          "added_at": "2026-01-08T21:30:07.785524"
        }
      ]
    }
  ]
 }
--- a/bot/requirements.txt
+++ b/bot/requirements.txt
@@ -18,3 +18,5 @@ Pillow
 opencv-contrib-python
 numpy
 scikit-learn
 transformers
 torch
--- a/bot/static/index.html
+++ b/bot/static/index.html
@@ -635,7 +635,12 @@
 <div class="panel">
  <div style="display: flex; justify-content: space-between; align-items: center; margin-bottom: 1rem;">
    <div style="display: flex; gap: 1rem; align-items: center;">
      <h1 id="panel-title">Miku Control Panel</h1>
      <button id="gpu-selector-toggle" onclick="toggleGPU()" style="background: #2a5599; color: #fff; padding: 0.5rem 1rem; border: 2px solid #4a7bc9; border-radius: 4px; cursor: pointer; font-weight: bold; font-size: 0.9rem;">
        🎮 GPU: NVIDIA
      </button>
    </div>
    <div style="display: flex; gap: 0.5rem; align-items: center;">
      <button id="bipolar-mode-toggle" onclick="toggleBipolarMode()" style="background: #333; color: #fff; padding: 0.5rem 1rem; border: 2px solid #666; border-radius: 4px; cursor: pointer; font-weight: bold;">
        🔄 Bipolar: OFF
@@ -804,7 +809,33 @@
      <!-- Bipolar Mode Section (only visible when bipolar mode is on) -->
      <div id="bipolar-section" class="section" style="display: none; border: 2px solid #9932CC; padding: 1rem; border-radius: 8px; background: #1a1a2e;">
        <h3 style="color: #9932CC;">🔄 Bipolar Mode Controls</h3>
-        <p style="font-size: 0.9rem; color: #aaa;">Trigger arguments between Regular Miku and Evil Miku</p>
+        <p style="font-size: 0.9rem; color: #aaa;">Trigger arguments or dialogues between Regular Miku and Evil Miku</p>
        <!-- Persona Dialogue Section -->
        <div style="margin-bottom: 2rem; padding: 1rem; background: #252540; border-radius: 8px; border: 1px solid #555;">
          <h4 style="color: #6B8EFF; margin-bottom: 0.5rem;">💬 Trigger Persona Dialogue</h4>
          <p style="font-size: 0.85rem; color: #999; margin-bottom: 1rem;">Start a natural conversation between the personas (can escalate to argument if tension builds)</p>
          <div style="margin-bottom: 1rem;">
            <label for="dialogue-message-id">Message ID:</label>
            <input type="text" id="dialogue-message-id" placeholder="e.g., 1234567890123456789" style="width: 250px; margin-left: 0.5rem; font-family: monospace;">
          </div>
          <div style="font-size: 0.8rem; color: #888; margin-bottom: 1rem;">
            💡 <strong>Tip:</strong> Right-click any bot response message in Discord and select "Copy Message ID". The opposite persona will analyze it and decide whether to interject.
          </div>
          <button onclick="triggerPersonaDialogue()" style="background: #6B8EFF; color: #fff; border: none; padding: 0.5rem 1rem; border-radius: 4px; cursor: pointer;">
            💬 Trigger Dialogue
          </button>
          <div id="dialogue-status" style="margin-top: 1rem; font-size: 0.9rem;"></div>
        </div>
        <!-- Argument Section -->
        <div style="padding: 1rem; background: #2e1a2e; border-radius: 8px; border: 1px solid #555;">
          <h4 style="color: #9932CC; margin-bottom: 0.5rem;">⚔️ Trigger Argument</h4>
          <p style="font-size: 0.85rem; color: #999; margin-bottom: 1rem;">Force an immediate argument (bypasses dialogue system)</p>
        <div style="margin-bottom: 1rem; display: flex; gap: 1rem; flex-wrap: wrap;">
          <div>
@@ -832,6 +863,7 @@
        </button>
        <div id="bipolar-status" style="margin-top: 1rem; font-size: 0.9rem;"></div>
        </div>
        <!-- Scoreboard Display -->
        <div id="bipolar-scoreboard" style="margin-top: 1.5rem; padding: 1rem; background: #0f0f1e; border-radius: 8px; border: 1px solid #444;">
@@ -1416,6 +1448,7 @@ document.addEventListener('DOMContentLoaded', function() {
  loadLogs();
  checkEvilModeStatus();  // Check evil mode on load
  checkBipolarModeStatus();  // Check bipolar mode on load
  checkGPUStatus();  // Check GPU selection on load
  console.log('🚀 DOMContentLoaded - initializing figurine subscribers list');
  refreshFigurineSubscribers();
  loadProfilePictureMetadata();
@@ -2194,6 +2227,59 @@ function updateEvilModeUI() {
  updateBipolarToggleVisibility();
 }
 // GPU Selection Management
 let selectedGPU = 'nvidia'; // 'nvidia' or 'amd'
 async function checkGPUStatus() {
  try {
    const response = await fetch('/gpu-status');
    if (response.ok) {
      const data = await response.json();
      selectedGPU = data.gpu || 'nvidia';
      updateGPUUI();
    }
  } catch (error) {
    console.error('Failed to check GPU status:', error);
  }
 }
 async function toggleGPU() {
  try {
    const toggleBtn = document.getElementById('gpu-selector-toggle');
    toggleBtn.disabled = true;
    toggleBtn.textContent = '⏳ Switching...';
    const result = await apiCall('/gpu-select', 'POST', {
      gpu: selectedGPU === 'nvidia' ? 'amd' : 'nvidia'
    });
    selectedGPU = result.gpu;
    updateGPUUI();
    const gpuName = selectedGPU === 'nvidia' ? 'NVIDIA GTX 1660' : 'AMD RX 6800';
    showNotification(`🎮 Switched to ${gpuName}!`);
  } catch (error) {
    console.error('Failed to toggle GPU:', error);
    showNotification('Failed to switch GPU: ' + error.message, 'error');
    toggleBtn.disabled = false;
  }
 }
 function updateGPUUI() {
  const toggleBtn = document.getElementById('gpu-selector-toggle');
  if (selectedGPU === 'amd') {
    toggleBtn.textContent = '🎮 GPU: AMD';
    toggleBtn.style.background = '#c91432';
    toggleBtn.style.borderColor = '#e91436';
  } else {
    toggleBtn.textContent = '🎮 GPU: NVIDIA';
    toggleBtn.style.background = '#2a5599';
    toggleBtn.style.borderColor = '#4a7bc9';
  }
  toggleBtn.disabled = false;
 }
 // Bipolar Mode Management
 let bipolarMode = false;
@@ -2266,6 +2352,48 @@ function updateBipolarToggleVisibility() {
  bipolarToggle.style.display = 'block';
 }
 async function triggerPersonaDialogue() {
  const messageIdInput = document.getElementById('dialogue-message-id').value.trim();
  const statusDiv = document.getElementById('dialogue-status');
  if (!messageIdInput) {
    showNotification('Please enter a message ID', 'error');
    return;
  }
  // Validate message ID format (should be numeric)
  if (!/^\d+$/.test(messageIdInput)) {
    showNotification('Invalid message ID format - should be a number', 'error');
    return;
  }
  try {
    statusDiv.innerHTML = '<span style="color: #6B8EFF;">⏳ Analyzing message for dialogue trigger...</span>';
    const requestBody = {
      message_id: messageIdInput
    };
    const result = await apiCall('/bipolar-mode/trigger-dialogue', 'POST', requestBody);
    if (result.status === 'error') {
      statusDiv.innerHTML = `<span style="color: #ff4444;">❌ ${result.message}</span>`;
      showNotification(result.message, 'error');
      return;
    }
    statusDiv.innerHTML = `<span style="color: #00ff00;">✅ ${result.message}</span>`;
    showNotification(`💬 ${result.message}`);
    // Clear the input
    document.getElementById('dialogue-message-id').value = '';
  } catch (error) {
    statusDiv.innerHTML = `<span style="color: #ff4444;">❌ Failed to trigger dialogue: ${error.message}</span>`;
    showNotification(`Error: ${error.message}`, 'error');
  }
 }
 async function triggerBipolarArgument() {
  const channelIdInput = document.getElementById('bipolar-channel-id').value.trim();
  const messageIdInput = document.getElementById('bipolar-message-id').value.trim();
--- a/bot/utils/bipolar_mode.py
+++ b/bot/utils/bipolar_mode.py
@@ -106,6 +106,7 @@ def restore_bipolar_mode_on_startup():
    if bipolar_mode:
        print("🔄 Bipolar mode restored from previous session")
        print("💬 Persona dialogue system enabled (natural conversations + arguments)")
    return bipolar_mode
--- a/bot/utils/gpu_preload.py
+++ b/bot/utils/gpu_preload.py
@@ -0,0 +1,69 @@
 """
 GPU Model Preloading Utility
 Preloads models on AMD GPU to take advantage of 16GB VRAM
 """
 import aiohttp
 import asyncio
 import json
 import globals
 async def preload_amd_models():
    """
    Preload both text and vision models on AMD GPU
    Since AMD RX 6800 has 16GB VRAM, we can keep both loaded simultaneously
    """
    print("🔧 Preloading models on AMD GPU...")
    # Preload text model
    try:
        async with aiohttp.ClientSession() as session:
            payload = {
                "model": "llama3.1",
                "messages": [{"role": "user", "content": "Hi"}],
                "max_tokens": 1
            }
            async with session.post(
                f"{globals.LLAMA_AMD_URL}/v1/chat/completions",
                json=payload,
                timeout=aiohttp.ClientTimeout(total=60)
            ) as response:
                if response.status == 200:
                    print("✅ Text model (llama3.1) preloaded on AMD GPU")
                else:
                    print(f"⚠️ Text model preload returned status {response.status}")
    except Exception as e:
        print(f"⚠️ Failed to preload text model on AMD: {e}")
    # Preload vision model
    try:
        async with aiohttp.ClientSession() as session:
            # Create a minimal test image (1x1 white pixel)
            import base64
            test_image = "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mP8z8DwHwAFBQIAX8jx0gAAAABJRU5ErkJggg=="
            payload = {
                "model": "vision",
                "messages": [
                    {
                        "role": "user",
                        "content": [
                            {"type": "text", "text": "What do you see?"},
                            {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{test_image}"}}
                        ]
                    }
                ],
                "max_tokens": 1
            }
            async with session.post(
                f"{globals.LLAMA_AMD_URL}/v1/chat/completions",
                json=payload,
                timeout=aiohttp.ClientTimeout(total=120)
            ) as response:
                if response.status == 200:
                    print("✅ Vision model preloaded on AMD GPU")
                else:
                    print(f"⚠️ Vision model preload returned status {response.status}")
    except Exception as e:
        print(f"⚠️ Failed to preload vision model on AMD: {e}")
    print("✅ AMD GPU preload complete - both models ready")
--- a/bot/utils/gpu_router.py
+++ b/bot/utils/gpu_router.py
@@ -0,0 +1,191 @@
 """
 GPU Router Utility for Dual GPU Setup
 Manages routing between NVIDIA and AMD GPUs for model inference
 """
 import os
 import random
 import logging
 from typing import Optional, Literal
 import globals
 logger = logging.getLogger(__name__)
 # Model to GPU mapping
 MODEL_TO_GPU = {
    # NVIDIA models (primary GPU)
    "llama3.1": globals.LLAMA_URL,
    "text-model": globals.LLAMA_URL,
    "darkidol": globals.LLAMA_URL,
    "evil-model": globals.LLAMA_URL,
    "uncensored": globals.LLAMA_URL,
    "vision": globals.LLAMA_URL,
    "vision-model": globals.LLAMA_URL,
    "minicpm": globals.LLAMA_URL,
    # AMD models (secondary GPU - RX 6800)
    "llama3.1-amd": globals.LLAMA_AMD_URL,
    "text-model-amd": globals.LLAMA_AMD_URL,
    "amd-text": globals.LLAMA_AMD_URL,
    "darkidol-amd": globals.LLAMA_AMD_URL,
    "evil-model-amd": globals.LLAMA_AMD_URL,
    "uncensored-amd": globals.LLAMA_AMD_URL,
    "moondream-amd": globals.LLAMA_AMD_URL,
    "vision-amd": globals.LLAMA_AMD_URL,
    "moondream": globals.LLAMA_AMD_URL,
 }
 # Configuration
 PREFER_AMD_GPU = os.getenv("PREFER_AMD_GPU", "false").lower() == "true"
 AMD_MODELS_ENABLED = os.getenv("AMD_MODELS_ENABLED", "true").lower() == "true"
 def get_endpoint_for_model(model_name: str) -> str:
    """
    Get the correct llama-swap endpoint for a model.
    Args:
        model_name: Name or alias of the model
    Returns:
        URL of the llama-swap endpoint (either NVIDIA or AMD)
    """
    endpoint = MODEL_TO_GPU.get(model_name, globals.LLAMA_URL)
    # If AMD models are disabled, use NVIDIA for AMD models too
    if not AMD_MODELS_ENABLED and endpoint == globals.LLAMA_AMD_URL:
        logger.warning(f"AMD GPU disabled, routing {model_name} to NVIDIA GPU")
        # Map AMD model name to NVIDIA equivalent
        nvidia_model = model_name.replace("-amd", "")
        endpoint = globals.LLAMA_URL
    return endpoint
 def is_amd_model(model_name: str) -> bool:
    """
    Check if a model runs on the AMD GPU.
    Args:
        model_name: Name or alias of the model
    Returns:
        True if model runs on AMD GPU, False otherwise
    """
    return model_name.endswith("-amd") or model_name in ["moondream", "moondream-amd", "vision-amd"]
 def get_llama_url_with_load_balancing(
    prefer_amd: bool = False,
    task_type: Literal["text", "vision", "evil"] = "text"
 ) -> tuple[str, str]:
    """
    Get llama URL with optional load balancing between GPUs.
    Returns both URL and recommended model name.
    Args:
        prefer_amd: If True, prefer AMD GPU when possible
        task_type: Type of task (text, vision, or evil)
    Returns:
        Tuple of (url, model_name)
    """
    if not AMD_MODELS_ENABLED:
        # AMD disabled, use NVIDIA only
        if task_type == "evil":
            return globals.LLAMA_URL, "darkidol"
        elif task_type == "vision":
            return globals.LLAMA_URL, "vision"
        else:
            return globals.LLAMA_URL, "llama3.1"
    # AMD enabled - implement load balancing
    use_amd = prefer_amd or PREFER_AMD_GPU or (random.random() < 0.5)
    if task_type == "evil":
        # Evil/uncensored models
        if use_amd:
            return globals.LLAMA_AMD_URL, "darkidol-amd"
        else:
            return globals.LLAMA_URL, "darkidol"
    elif task_type == "vision":
        # Vision models - MiniCPM on NVIDIA, Moondream on AMD
        if use_amd:
            return globals.LLAMA_AMD_URL, "moondream-amd"
        else:
            return globals.LLAMA_URL, "vision"
    else:
        # Text generation - round robin between GPUs
        if use_amd:
            return globals.LLAMA_AMD_URL, "llama3.1-amd"
        else:
            return globals.LLAMA_URL, "llama3.1"
 def get_vision_model_for_gpu(prefer_amd: bool = False) -> tuple[str, str]:
    """
    Get the appropriate vision model based on GPU preference.
    Args:
        prefer_amd: If True, use AMD GPU vision model
    Returns:
        Tuple of (url, model_name)
    """
    if prefer_amd and AMD_MODELS_ENABLED:
        return globals.LLAMA_AMD_URL, "moondream-amd"
    else:
        return globals.LLAMA_URL, "vision"
 def get_text_model_for_gpu(prefer_amd: bool = False) -> tuple[str, str]:
    """
    Get the appropriate text model based on GPU preference.
    Args:
        prefer_amd: If True, use AMD GPU text model
    Returns:
        Tuple of (url, model_name)
    """
    if prefer_amd and AMD_MODELS_ENABLED:
        return globals.LLAMA_AMD_URL, "llama3.1-amd"
    else:
        return globals.LLAMA_URL, "llama3.1"
 def log_gpu_routing(model_name: str, endpoint: str, task_type: str = "inference"):
    """
    Log GPU routing decision for debugging.
    Args:
        model_name: Name of the model being used
        endpoint: URL endpoint being used
        task_type: Type of task being performed
    """
    gpu_type = "AMD RX 6800" if endpoint == globals.LLAMA_AMD_URL else "NVIDIA"
    logger.info(f"[GPU Router] {task_type} - Using {model_name} on {gpu_type} ({endpoint})")
 # Example usage in bot code:
 """
 # Simple routing by model name
 url = get_endpoint_for_model("llama3.1-amd")
 # Load balanced routing
 url, model = get_llama_url_with_load_balancing(task_type="text")
 response = requests.post(
    f"{url}/v1/chat/completions",
    json={"model": model, ...}
 )
 # Vision model with GPU preference
 url, model = get_vision_model_for_gpu(prefer_amd=True)
 # With logging
 url = get_endpoint_for_model("darkidol-amd")
 log_gpu_routing("darkidol-amd", url, "evil mode generation")
 """
--- a/bot/utils/image_handling.py
+++ b/bot/utils/image_handling.py
@@ -233,7 +233,9 @@ async def analyze_image_with_vision(base64_img):
    """
    Analyze an image using llama.cpp multimodal capabilities.
    Uses OpenAI-compatible chat completions API with image_url.
    Always uses NVIDIA GPU for vision model.
    """
    from utils.llm import get_vision_gpu_url
    payload = {
        "model": globals.VISION_MODEL,
@@ -262,7 +264,8 @@ async def analyze_image_with_vision(base64_img):
    async with aiohttp.ClientSession() as session:
        try:
-            async with session.post(f"{globals.LLAMA_URL}/v1/chat/completions", json=payload, headers=headers) as response:
+            vision_url = get_vision_gpu_url()
            async with session.post(f"{vision_url}/v1/chat/completions", json=payload, headers=headers) as response:
                if response.status == 200:
                    data = await response.json()
                    return data.get("choices", [{}])[0].get("message", {}).get("content", "No description.")
@@ -323,7 +326,8 @@ async def analyze_video_with_vision(video_frames, media_type="video"):
    async with aiohttp.ClientSession() as session:
        try:
-            async with session.post(f"{globals.LLAMA_URL}/v1/chat/completions", json=payload, headers=headers) as response:
+            vision_url = get_vision_gpu_url()
            async with session.post(f"{vision_url}/v1/chat/completions", json=payload, headers=headers) as response:
                if response.status == 200:
                    data = await response.json()
                    return data.get("choices", [{}])[0].get("message", {}).get("content", "No description.")
--- a/bot/utils/llm.py
+++ b/bot/utils/llm.py
@@ -4,11 +4,38 @@ import aiohttp
 import datetime
 import globals
 import asyncio
 import json
 import os
 from utils.context_manager import get_context_for_response_type, get_complete_context
 from utils.moods import load_mood_description
 from utils.conversation_history import conversation_history
 def get_current_gpu_url():
    """Get the URL for the currently selected GPU for text models"""
    gpu_state_file = os.path.join(os.path.dirname(__file__), "..", "memory", "gpu_state.json")
    try:
        with open(gpu_state_file, "r") as f:
            state = json.load(f)
            current_gpu = state.get("current_gpu", "nvidia")
            if current_gpu == "amd":
                return globals.LLAMA_AMD_URL
            else:
                return globals.LLAMA_URL
    except Exception as e:
        print(f"⚠️ GPU state read error: {e}, defaulting to NVIDIA")
        # Default to NVIDIA if state file doesn't exist
        return globals.LLAMA_URL
 def get_vision_gpu_url():
    """
    Get the URL for vision model inference.
    Strategy: Always use NVIDIA GPU for vision to avoid unloading/reloading.
    - When NVIDIA is primary: Use NVIDIA for both text and vision
    - When AMD is primary: Use AMD for text, NVIDIA for vision (keeps vision loaded)
    """
    return globals.LLAMA_URL  # Always use NVIDIA for vision
 def _strip_surrounding_quotes(text):
    """
    Remove surrounding quotes from text if present.
@@ -233,9 +260,13 @@ Please respond in a way that reflects this emotional tone.{pfp_context}"""
    async with aiohttp.ClientSession() as session:
        try:
            # Get current GPU URL based on user selection
            llama_url = get_current_gpu_url()
            print(f"🎮 Using GPU endpoint: {llama_url}")
            # Add timeout to prevent hanging indefinitely
            timeout = aiohttp.ClientTimeout(total=300)  # 300 second timeout
-            async with session.post(f"{globals.LLAMA_URL}/v1/chat/completions", json=payload, headers=headers, timeout=timeout) as response:
+            async with session.post(f"{llama_url}/v1/chat/completions", json=payload, headers=headers, timeout=timeout) as response:
                if response.status == 200:
                    data = await response.json()
                    reply = data.get("choices", [{}])[0].get("message", {}).get("content", "No response.")
--- a/bot/utils/persona_dialogue.py
+++ b/bot/utils/persona_dialogue.py
@@ -0,0 +1,964 @@
 # utils/persona_dialogue.py
 """
 Persona Dialogue System for Miku.
 Enables natural back-and-forth conversations between Hatsune Miku and Evil Miku.
 Unlike bipolar_mode.py (which handles arguments), this module handles:
 - Detecting when the opposite persona should interject
 - Managing natural dialogue flow with self-signaling continuation
 - Tracking tension that can escalate into arguments
 - Seamless handoff to the argument system when tension is high
 This system is designed to be lightweight on LLM calls:
 - Initial trigger uses fast heuristics + sentiment analysis
 - Each dialogue turn uses ONE LLM call that generates response AND decides continuation
 - Only escalates to argument system when tension threshold is reached
 """
 import os
 import json
 import time
 import asyncio
 import discord
 import globals
 from transformers import pipeline
 # ============================================================================
 # CONSTANTS
 # ============================================================================
 DIALOGUE_STATE_FILE = "memory/persona_dialogue_state.json"
 # Dialogue settings
 MAX_TURNS = 20                      # Maximum turns before forced end
 DIALOGUE_TIMEOUT = 900              # 15 minutes max dialogue duration
 ARGUMENT_TENSION_THRESHOLD = 0.75   # Tension level that triggers argument escalation
 # Initial trigger settings
 INTERJECTION_COOLDOWN_HARD = 180    # 3 minutes hard block
 INTERJECTION_COOLDOWN_SOFT = 900    # 15 minutes for full recovery
 INTERJECTION_THRESHOLD = 0.75        # Score needed to trigger interjection (lowered to account for mood multipliers)
 # ============================================================================
 # INTERJECTION SCORER (Initial Trigger Decision)
 # ============================================================================
 class InterjectionScorer:
    """
    Decides if the opposite persona should interject based on message content.
    Uses fast heuristics + sentiment analysis (no LLM calls).
    """
    _instance = None
    _sentiment_analyzer = None
    def __new__(cls):
        if cls._instance is None:
            cls._instance = super().__new__(cls)
        return cls._instance
    @property
    def sentiment_analyzer(self):
        """Lazy load sentiment analyzer"""
        if self._sentiment_analyzer is None:
            print("🔄 Loading sentiment analyzer for persona dialogue...")
            try:
                self._sentiment_analyzer = pipeline(
                    "sentiment-analysis", 
                    model="distilbert-base-uncased-finetuned-sst-2-english"
                )
                print("✅ Sentiment analyzer loaded")
            except Exception as e:
                print(f"⚠️ Failed to load sentiment analyzer: {e}")
                self._sentiment_analyzer = None
        return self._sentiment_analyzer
    async def should_interject(self, message: discord.Message, current_persona: str) -> tuple:
        """
        Determine if the opposite persona should interject.
        Args:
            message: The Discord message to analyze
            current_persona: Who just spoke ("miku" or "evil")
        Returns:
            Tuple of (should_interject: bool, reason: str, score: float)
        """
        # Quick rejections
        if not self._passes_basic_filter(message):
            return False, "basic_filter_failed", 0.0
        # Check cooldown
        cooldown_mult = self._check_cooldown()
        if cooldown_mult == 0.0:
            return False, "cooldown_active", 0.0
        opposite_persona = "evil" if current_persona == "miku" else "miku"
        print(f"🔍 [Interjection] Analyzing content: '{message.content[:100]}...'")
        print(f"🔍 [Interjection] Current persona: {current_persona}, Opposite: {opposite_persona}")
        # Calculate score from various factors
        score = 0.0
        reasons = []
        # Factor 1: Direct addressing (automatic trigger)
        if self._mentions_opposite(message.content, opposite_persona):
            print(f"✅ [Interjection] Direct mention of {opposite_persona} detected!")
            return True, "directly_addressed", 1.0
        # Factor 2: Topic relevance
        topic_score = self._check_topic_relevance(message.content, opposite_persona)
        if topic_score > 0:
            score += topic_score * 0.3
            reasons.append(f"topic:{topic_score:.2f}")
        # Factor 3: Emotional intensity
        emotion_score = self._check_emotional_intensity(message.content)
        if emotion_score > 0.6:
            score += emotion_score * 0.25
            reasons.append(f"emotion:{emotion_score:.2f}")
        # Factor 4: Personality clash
        clash_score = self._detect_personality_clash(message.content, opposite_persona)
        if clash_score > 0:
            score += clash_score * 0.25
            reasons.append(f"clash:{clash_score:.2f}")
        # Factor 5: Mood multiplier
        mood_mult = self._get_mood_multiplier(opposite_persona)
        score *= mood_mult
        if mood_mult != 1.0:
            reasons.append(f"mood_mult:{mood_mult:.2f}")
        # Factor 6: Context bonus
        context_bonus = self._check_conversation_context(message)
        score += context_bonus * 0.2
        if context_bonus > 0:
            reasons.append(f"context:{context_bonus:.2f}")
        # Apply cooldown multiplier
        score *= cooldown_mult
        # Decision
        should_interject = score >= INTERJECTION_THRESHOLD
        reason_str = " | ".join(reasons) if reasons else "no_triggers"
        if should_interject:
            print(f"✅ {opposite_persona.upper()} WILL INTERJECT (score: {score:.2f})")
            print(f"   Reasons: {reason_str}")
        return should_interject, reason_str, score
    def _passes_basic_filter(self, message: discord.Message) -> bool:
        """Fast rejection criteria"""
        # System messages
        if message.type != discord.MessageType.default:
            print(f"❌ [Basic Filter] System message type: {message.type}")
            return False
        # Bipolar mode must be enabled
        if not globals.BIPOLAR_MODE:
            print(f"❌ [Basic Filter] Bipolar mode not enabled")
            return False
        # Allow bot's own messages (we're checking them for interjections!)
        # Also allow webhook messages (persona messages)
        # Only reject OTHER bots' messages
        if message.author.bot and not message.webhook_id:
            # Check if it's our own bot
            if message.author.id != globals.client.user.id:
                print(f"❌ [Basic Filter] Other bot message (not our bot)")
                return False
        print(f"✅ [Basic Filter] Passed (bot={message.author.bot}, webhook={message.webhook_id}, our_bot={message.author.id == globals.client.user.id if message.author.bot else 'N/A'})")
        return True
    def _mentions_opposite(self, content: str, opposite_persona: str) -> bool:
        """Check if message directly addresses the opposite persona"""
        content_lower = content.lower()
        if opposite_persona == "evil":
            patterns = ["evil miku", "dark miku", "evil version", "bad miku", "evil you"]
        else:
            patterns = ["normal miku", "regular miku", "good miku", "real miku", "nice miku", "other miku", "original miku"]
        return any(pattern in content_lower for pattern in patterns)
    def _check_topic_relevance(self, content: str, opposite_persona: str) -> float:
        """Check if topics would interest the opposite persona"""
        content_lower = content.lower()
        if opposite_persona == "evil":
            # Things Evil Miku can't resist commenting on
            TRIGGER_TOPICS = {
                "optimism": ["happiness", "joy", "love", "kindness", "hope", "dreams", "wonderful", "amazing"],
                "morality": ["good", "should", "must", "right thing", "deserve", "fair", "justice"],
                "weakness": ["scared", "nervous", "worried", "unsure", "help me", "don't know"],
                "innocence": ["innocent", "pure", "sweet", "cute", "wholesome", "precious"],
            }
        else:
            # Things Miku can't ignore
            TRIGGER_TOPICS = {
                "negativity": ["hate", "terrible", "awful", "worst", "horrible", "disgusting", "pathetic"],
                "cruelty": ["deserve pain", "suffer", "worthless", "stupid", "idiot", "fool"],
                "hopelessness": ["no point", "meaningless", "nobody cares", "why bother", "give up"],
                "evil_gloating": ["foolish", "naive", "weak", "inferior", "pathetic"],
            }
        total_matches = 0
        for category, keywords in TRIGGER_TOPICS.items():
            matches = sum(1 for keyword in keywords if keyword in content_lower)
            total_matches += matches
        return min(total_matches / 3.0, 1.0)
    def _check_emotional_intensity(self, content: str) -> float:
        """Check emotional intensity using sentiment analysis"""
        if not self.sentiment_analyzer:
            return 0.5  # Neutral if no analyzer
        try:
            result = self.sentiment_analyzer(content[:512])[0]
            confidence = result['score']
            # Punctuation intensity
            exclamations = content.count('!')
            questions = content.count('?')
            caps_ratio = sum(1 for c in content if c.isupper()) / max(len(content), 1)
            intensity_markers = (exclamations * 0.15) + (questions * 0.1) + (caps_ratio * 0.3)
            return min(confidence * 0.6 + intensity_markers, 1.0)
        except Exception as e:
            print(f"⚠️ Sentiment analysis error: {e}")
            return 0.5
    def _detect_personality_clash(self, content: str, opposite_persona: str) -> float:
        """Detect statements that clash with the opposite persona's values"""
        content_lower = content.lower()
        if opposite_persona == "evil":
            # User being too positive/naive = Evil Miku wants to "correct" them
            positive_statements = [
                "i believe in", "i love", "everything will be", "so happy",
                "the best", "amazing", "perfect", "wonderful life", "so grateful"
            ]
            return 0.8 if any(stmt in content_lower for stmt in positive_statements) else 0.0
        else:
            # User being cruel/negative = Miku wants to help/defend
            negative_statements = [
                "i hate", "everyone sucks", "life is meaningless", "don't care",
                "deserve to suffer", "nobody matters", "worthless", "all terrible"
            ]
            return 0.8 if any(stmt in content_lower for stmt in negative_statements) else 0.0
    def _get_mood_multiplier(self, opposite_persona: str) -> float:
        """Current mood affects likelihood of interjection"""
        if opposite_persona == "evil":
            MOOD_MULTIPLIERS = {
                "aggressive": 1.5,
                "cruel": 1.3,
                "mischievous": 1.2,
                "cunning": 1.0,
                "sarcastic": 1.1,
                "evil_neutral": 0.8,
                "contemplative": 0.6,
            }
            return MOOD_MULTIPLIERS.get(globals.EVIL_DM_MOOD, 1.0)
        else:
            MOOD_MULTIPLIERS = {
                "bubbly": 1.4,
                "excited": 1.3,
                "curious": 1.2,
                "neutral": 1.0,
                "irritated": 0.9,
                "melancholy": 0.7,
                "asleep": 0.1,
            }
            return MOOD_MULTIPLIERS.get(globals.DM_MOOD, 1.0)
    def _check_conversation_context(self, message: discord.Message) -> float:
        """Check if this is part of an active conversation"""
        score = 0.0
        # Part of a reply chain
        if hasattr(message, 'reference') and message.reference:
            score += 0.5
        # Could add more context checks here
        score += 0.2  # Base activity bonus
        return min(score, 1.0)
    def _check_cooldown(self) -> float:
        """Check cooldown and return multiplier (0.0 = blocked, 1.0 = full)"""
        if not hasattr(globals, 'LAST_PERSONA_DIALOGUE_TIME'):
            globals.LAST_PERSONA_DIALOGUE_TIME = 0
        current_time = time.time()
        time_since_last = current_time - globals.LAST_PERSONA_DIALOGUE_TIME
        if time_since_last < INTERJECTION_COOLDOWN_HARD:
            return 0.0
        elif time_since_last < INTERJECTION_COOLDOWN_SOFT:
            return (time_since_last - INTERJECTION_COOLDOWN_HARD) / (INTERJECTION_COOLDOWN_SOFT - INTERJECTION_COOLDOWN_HARD)
        else:
            return 1.0
 # ============================================================================
 # PERSONA DIALOGUE MANAGER
 # ============================================================================
 class PersonaDialogue:
    """
    Manages natural back-and-forth conversations between Miku and Evil Miku.
    Each turn:
    1. Generate response + continuation signal (single LLM call)
    2. Calculate tension delta from response
    3. If tension >= threshold, escalate to argument
    4. Otherwise, continue or end based on signal
    """
    _instance = None
    _sentiment_analyzer = None
    def __new__(cls):
        if cls._instance is None:
            cls._instance = super().__new__(cls)
            cls._instance.active_dialogues = {}
        return cls._instance
    @property
    def sentiment_analyzer(self):
        """Lazy load sentiment analyzer (shared with InterjectionScorer)"""
        if self._sentiment_analyzer is None:
            scorer = InterjectionScorer()
            self._sentiment_analyzer = scorer.sentiment_analyzer
        return self._sentiment_analyzer
    # ========================================================================
    # DIALOGUE STATE MANAGEMENT
    # ========================================================================
    def is_dialogue_active(self, channel_id: int) -> bool:
        """Check if a dialogue is active in a channel"""
        return channel_id in self.active_dialogues
    def get_dialogue_state(self, channel_id: int) -> dict:
        """Get dialogue state for a channel"""
        return self.active_dialogues.get(channel_id, None)
    def start_dialogue(self, channel_id: int) -> dict:
        """Start a new dialogue in a channel"""
        state = {
            "turn_count": 0,
            "started_at": time.time(),
            "tension": 0.0,
            "tension_history": [],
            "last_speaker": None,
        }
        self.active_dialogues[channel_id] = state
        globals.LAST_PERSONA_DIALOGUE_TIME = time.time()
        print(f"💬 Started persona dialogue in channel {channel_id}")
        return state
    def end_dialogue(self, channel_id: int):
        """End a dialogue in a channel"""
        if channel_id in self.active_dialogues:
            state = self.active_dialogues[channel_id]
            print(f"🏁 Ended persona dialogue in channel {channel_id}")
            print(f"   Turns: {state['turn_count']}, Final tension: {state['tension']:.2f}")
            del self.active_dialogues[channel_id]
    # ========================================================================
    # TENSION CALCULATION
    # ========================================================================
    def calculate_tension_delta(self, response_text: str, current_tension: float) -> float:
        """
        Analyze a response and determine how much tension it adds/removes.
        Returns delta to add to current tension score.
        """
        # Sentiment analysis
        base_delta = 0.0
        if self.sentiment_analyzer:
            try:
                sentiment = self.sentiment_analyzer(response_text[:512])[0]
                sentiment_score = sentiment['score']
                is_negative = sentiment['label'] == 'NEGATIVE'
                if is_negative:
                    base_delta = sentiment_score * 0.15
                else:
                    base_delta = -sentiment_score * 0.05
            except Exception as e:
                print(f"⚠️ Sentiment analysis error in tension calc: {e}")
        text_lower = response_text.lower()
        # Escalation patterns
        escalation_patterns = {
            "insult": ["idiot", "stupid", "pathetic", "fool", "naive", "worthless", "disgusting", "moron"],
            "dismissive": ["whatever", "don't care", "waste of time", "not worth", "beneath me", "boring"],
            "confrontational": ["wrong", "you always", "you never", "how dare", "shut up", "stop"],
            "mockery": ["oh please", "how cute", "adorable that you think", "laughable", "hilarious"],
            "challenge": ["prove it", "fight me", "make me", "i dare you", "try me"],
        }
        # De-escalation patterns
        deescalation_patterns = {
            "concession": ["you're right", "fair point", "i suppose", "maybe you have", "good point"],
            "softening": ["i understand", "let's calm", "didn't mean", "sorry", "apologize"],
            "deflection": ["anyway", "moving on", "whatever you say", "agree to disagree", "let's just"],
        }
        # Check escalation
        for category, patterns in escalation_patterns.items():
            matches = sum(1 for p in patterns if p in text_lower)
            if matches > 0:
                base_delta += matches * 0.08
        # Check de-escalation
        for category, patterns in deescalation_patterns.items():
            matches = sum(1 for p in patterns if p in text_lower)
            if matches > 0:
                base_delta -= matches * 0.06
        # Intensity multipliers
        exclamation_count = response_text.count('!')
        caps_ratio = sum(1 for c in response_text if c.isupper()) / max(len(response_text), 1)
        if exclamation_count > 2 or caps_ratio > 0.3:
            base_delta *= 1.3
        # Momentum factor
        if current_tension > 0.5:
            base_delta *= 1.2
        return base_delta
    # ========================================================================
    # RESPONSE GENERATION
    # ========================================================================
    async def generate_response_with_continuation(
        self, 
        channel: discord.TextChannel,
        responding_persona: str,
        context: str,
    ) -> tuple:
        """
        Generate response AND continuation signal in a single LLM call.
        Returns:
            Tuple of (response_text, should_continue, confidence)
        """
        from utils.llm import query_llama
        opposite = "Hatsune Miku" if responding_persona == "evil" else "Evil Miku"
        # Get system prompt for persona
        system_prompt = self._get_persona_system_prompt(responding_persona)
        # Build the combined prompt
        prompt = f"""{system_prompt}
 Recent conversation:
 {context}
 Respond naturally as yourself. Keep your response conversational and in-character.
 ---
 After your response, evaluate whether {opposite} would want to (or need to) respond.
 The conversation should CONTINUE if ANY of these are true:
 - You asked them a direct question (almost always YES)
 - You made a provocative claim they'd dispute
 - You challenged or insulted them
 - The topic feels unfinished or confrontational
 - There's clear tension or disagreement
 The conversation might END if ALL of these are true:
 - No questions were asked
 - You made a definitive closing statement ("I'm done", "whatever", "goodbye")
 - The exchange reached complete resolution
 - Both sides have said their piece
 IMPORTANT: If you asked a question, the answer is almost always YES - they need to respond!
 On a new line after your response, write:
 [CONTINUE: YES or NO] [CONFIDENCE: HIGH, MEDIUM, or LOW]"""
        # Use appropriate model
        model = globals.EVIL_TEXT_MODEL if responding_persona == "evil" else globals.TEXT_MODEL
        # Temporarily set evil mode for proper context
        original_evil_mode = globals.EVIL_MODE
        globals.EVIL_MODE = (responding_persona == "evil")
        try:
            raw_response = await query_llama(
                user_prompt=prompt,
                user_id=f"persona_dialogue_{channel.id}",
                guild_id=channel.guild.id if hasattr(channel, 'guild') and channel.guild else None,
                response_type="autonomous_general",
                model=model
            )
        finally:
            globals.EVIL_MODE = original_evil_mode
        if not raw_response or raw_response.startswith("Error"):
            return None, False, "LOW"
        # Parse response and signal
        response_text, should_continue, confidence = self._parse_response(raw_response)
        return response_text, should_continue, confidence
    def _parse_response(self, raw_response: str) -> tuple:
        """Extract response text and continuation signal"""
        lines = raw_response.strip().split('\n')
        should_continue = False
        confidence = "MEDIUM"
        response_lines = []
        for line in lines:
            line_upper = line.upper()
            if "[CONTINUE:" in line_upper:
                should_continue = "YES" in line_upper
                if "HIGH" in line_upper:
                    confidence = "HIGH"
                elif "LOW" in line_upper:
                    confidence = "LOW"
                else:
                    confidence = "MEDIUM"
            else:
                response_lines.append(line)
        response_text = '\n'.join(response_lines).strip()
        # Clean up any stray signal markers
        response_text = response_text.replace("[CONTINUE:", "").replace("]", "")
        response_text = response_text.replace("YES", "").replace("NO", "")
        response_text = response_text.replace("HIGH", "").replace("MEDIUM", "").replace("LOW", "")
        response_text = response_text.strip()
        # Override: If the response contains a question mark, always continue
        if '?' in response_text:
            print(f"⚠️ [Parse Override] Question detected, forcing continue=YES")
            should_continue = True
            if confidence == "LOW":
                confidence = "MEDIUM"
        return response_text, should_continue, confidence
    def _get_persona_system_prompt(self, persona: str) -> str:
        """Get system prompt for a persona"""
        if persona == "evil":
            from utils.evil_mode import get_evil_system_prompt
            return get_evil_system_prompt()
        else:
            # Regular Miku prompt - simplified for dialogue
            return """You are Hatsune Miku, the virtual singer. You are in a conversation with your alter ego, Evil Miku.
 You are generally kind, bubbly, and optimistic, but you're not a pushover. You can be:
 - Assertive when defending your values
 - Frustrated when she's being cruel
 - Curious about her perspective
 - Hopeful that you can find common ground
 - Playful when the mood allows
 Respond naturally and conversationally. Keep responses concise (1-3 sentences typically).
 You can use emojis naturally! ✨💙"""
    # ========================================================================
    # DIALOGUE TURN HANDLING
    # ========================================================================
    async def handle_dialogue_turn(
        self, 
        channel: discord.TextChannel, 
        responding_persona: str,
        trigger_reason: str = None
    ):
        """
        Handle one turn of dialogue, tracking tension for potential argument escalation.
        """
        channel_id = channel.id
        # Get or create dialogue state
        state = self.active_dialogues.get(channel_id)
        if not state:
            state = self.start_dialogue(channel_id)
        # Safety limits
        if state["turn_count"] >= MAX_TURNS:
            print(f"🛑 Dialogue reached {MAX_TURNS} turns, ending")
            self.end_dialogue(channel_id)
            return
        if time.time() - state["started_at"] > DIALOGUE_TIMEOUT:
            print(f"🛑 Dialogue timeout (15 min), ending")
            self.end_dialogue(channel_id)
            return
        # Build context from recent messages
        context = await self._build_conversation_context(channel)
        # Generate response with continuation signal
        response_text, should_continue, confidence = await self.generate_response_with_continuation(
            channel=channel,
            responding_persona=responding_persona,
            context=context,
        )
        if not response_text:
            print(f"⚠️ Failed to generate response for {responding_persona}")
            self.end_dialogue(channel_id)
            return
        # Calculate tension change
        tension_delta = self.calculate_tension_delta(response_text, state["tension"])
        state["tension"] = max(0.0, min(1.0, state["tension"] + tension_delta))
        state["tension_history"].append({
            "turn": state["turn_count"],
            "speaker": responding_persona,
            "delta": tension_delta,
            "total": state["tension"],
        })
        print(f"🌡️ Tension: {state['tension']:.2f} (delta: {tension_delta:+.2f})")
        # Check if we should escalate to argument
        if state["tension"] >= ARGUMENT_TENSION_THRESHOLD:
            print(f"🔥 TENSION THRESHOLD REACHED ({state['tension']:.2f}) - ESCALATING TO ARGUMENT")
            # Send the response that pushed us over
            await self._send_as_persona(channel, responding_persona, response_text)
            # Transition to argument system
            await self._escalate_to_argument(channel, responding_persona, response_text)
            return
        # Send response
        await self._send_as_persona(channel, responding_persona, response_text)
        # Update state
        state["turn_count"] += 1
        state["last_speaker"] = responding_persona
        print(f"🗣️ Turn {state['turn_count']}: {responding_persona} | Continue: {should_continue} ({confidence}) | Tension: {state['tension']:.2f}")
        # Decide what happens next
        opposite = "evil" if responding_persona == "miku" else "miku"
        if should_continue and confidence in ["HIGH", "MEDIUM"]:
            asyncio.create_task(self._next_turn(channel, opposite))
        elif should_continue and confidence == "LOW":
            asyncio.create_task(self._next_turn(channel, opposite))
        elif not should_continue and confidence == "LOW":
            # Offer opposite persona the last word
            asyncio.create_task(
                self._offer_last_word(channel, opposite, context + f"\n{responding_persona}: {response_text}")
            )
        else:
            # Clear signal to end
            print(f"🏁 Dialogue ended naturally after {state['turn_count']} turns (tension: {state['tension']:.2f})")
            self.end_dialogue(channel_id)
    async def _next_turn(self, channel: discord.TextChannel, persona: str):
        """Queue the next turn"""
        # Check if dialogue was interrupted
        if await self._was_interrupted(channel):
            print(f"💬 Dialogue interrupted by other activity")
            self.end_dialogue(channel.id)
            return
        await self.handle_dialogue_turn(channel, persona)
    async def _offer_last_word(self, channel: discord.TextChannel, persona: str, context: str):
        """
        When speaker said NO with LOW confidence, ask opposite if they want to respond.
        """
        from utils.llm import query_llama
        channel_id = channel.id
        state = self.active_dialogues.get(channel_id)
        if not state:
            return
        if await self._was_interrupted(channel):
            self.end_dialogue(channel_id)
            return
        system_prompt = self._get_persona_system_prompt(persona)
        prompt = f"""{system_prompt}
 Recent exchange:
 {context}
 The conversation seems to be wrapping up, but wasn't explicitly ended.
 Do you have anything to add? If so, respond naturally.
 If you're fine letting it end here, write only: [DONE]
 Don't force a response if you have nothing meaningful to contribute."""
        model = globals.EVIL_TEXT_MODEL if persona == "evil" else globals.TEXT_MODEL
        original_evil_mode = globals.EVIL_MODE
        globals.EVIL_MODE = (persona == "evil")
        try:
            response = await query_llama(
                user_prompt=prompt,
                user_id=f"persona_dialogue_{channel_id}",
                guild_id=channel.guild.id if hasattr(channel, 'guild') and channel.guild else None,
                response_type="autonomous_general",
                model=model
            )
        finally:
            globals.EVIL_MODE = original_evil_mode
        if not response:
            self.end_dialogue(channel_id)
            return
        if "[DONE]" in response.upper():
            print(f"🏁 {persona} chose not to respond, dialogue ended (tension: {state['tension']:.2f})")
            self.end_dialogue(channel_id)
        else:
            clean_response = response.replace("[DONE]", "").strip()
            # Calculate tension
            tension_delta = self.calculate_tension_delta(clean_response, state["tension"])
            state["tension"] = max(0.0, min(1.0, state["tension"] + tension_delta))
            print(f"🌡️ Last word tension: {state['tension']:.2f} (delta: {tension_delta:+.2f})")
            # Check for argument escalation
            if state["tension"] >= ARGUMENT_TENSION_THRESHOLD:
                print(f"🔥 TENSION THRESHOLD REACHED on last word - ESCALATING TO ARGUMENT")
                await self._send_as_persona(channel, persona, clean_response)
                await self._escalate_to_argument(channel, persona, clean_response)
                return
            # Normal flow
            await self._send_as_persona(channel, persona, clean_response)
            state["turn_count"] += 1
            # Check if this looks like a closing statement
            opposite = "evil" if persona == "miku" else "miku"
            await self._check_if_final(channel, persona, clean_response, opposite)
    async def _check_if_final(self, channel: discord.TextChannel, speaker: str, response: str, opposite: str):
        """Check if a response looks like a closing statement"""
        state = self.active_dialogues.get(channel.id)
        if not state:
            return
        # Simple heuristics for closing statements
        closing_indicators = [
            response.rstrip().endswith('.'),  # Statement, not question
            '?' not in response,               # No questions asked
            len(response) < 100,               # Short responses often close things
        ]
        if all(closing_indicators):
            print(f"🏁 Dialogue ended after last word, {state['turn_count']} turns total")
            self.end_dialogue(channel.id)
        else:
            asyncio.create_task(self._next_turn(channel, opposite))
    # ========================================================================
    # ARGUMENT ESCALATION
    # ========================================================================
    async def _escalate_to_argument(self, channel: discord.TextChannel, last_speaker: str, triggering_message: str):
        """
        Transition from dialogue to full bipolar argument.
        """
        from utils.bipolar_mode import is_argument_in_progress, run_argument
        # Clean up dialogue state
        self.end_dialogue(channel.id)
        # Don't start if an argument is already going
        if is_argument_in_progress(channel.id):
            print(f"⚠️ Argument already in progress, skipping escalation")
            return
        # Build context for the argument
        escalation_context = f"""This argument erupted from a conversation that got heated.
 The last thing said was: "{triggering_message}"
 This pushed things over the edge into a full argument."""
        print(f"⚔️ Escalating to argument in #{channel.name}")
        # Use the existing argument system
        # Pass the triggering message so the opposite persona responds to it
        await run_argument(
            channel=channel,
            client=globals.client,
            trigger_context=escalation_context,
        )
    # ========================================================================
    # HELPER METHODS
    # ========================================================================
    async def _was_interrupted(self, channel: discord.TextChannel) -> bool:
        """Check if someone else sent a message during the dialogue"""
        state = self.active_dialogues.get(channel.id)
        if not state:
            return True
        try:
            async for msg in channel.history(limit=1):
                # If latest message is NOT from our webhooks, we were interrupted
                if not msg.webhook_id:
                    # Check if it's from the bot itself (could be normal response)
                    if msg.author.id != globals.client.user.id:
                        return True
        except Exception as e:
            print(f"⚠️ Error checking for interruption: {e}")
        return False
    async def _build_conversation_context(self, channel: discord.TextChannel, limit: int = 15) -> str:
        """Get recent messages for context"""
        messages = []
        try:
            async for msg in channel.history(limit=limit):
                speaker = self._identify_speaker(msg)
                messages.append(f"{speaker}: {msg.content}")
            messages.reverse()
        except Exception as e:
            print(f"⚠️ Error building conversation context: {e}")
        return '\n'.join(messages)
    def _identify_speaker(self, message: discord.Message) -> str:
        """Identify who sent a message"""
        if message.webhook_id:
            name_lower = (message.author.name or "").lower()
            if "evil" in name_lower:
                return "Evil Miku"
            return "Hatsune Miku"
        elif message.author.id == globals.client.user.id:
            # Bot's own messages - check mode at time of message
            if globals.EVIL_MODE:
                return "Evil Miku"
            return "Hatsune Miku"
        return message.author.display_name
    async def _send_as_persona(self, channel: discord.TextChannel, persona: str, content: str):
        """Send message via webhook"""
        from utils.bipolar_mode import (
            get_or_create_webhooks_for_channel,
            get_miku_display_name,
            get_evil_miku_display_name
        )
        webhooks = await get_or_create_webhooks_for_channel(channel)
        if not webhooks:
            print(f"⚠️ Could not get webhooks for #{channel.name}")
            return
        webhook = webhooks["evil_miku"] if persona == "evil" else webhooks["miku"]
        display_name = get_evil_miku_display_name() if persona == "evil" else get_miku_display_name()
        try:
            await webhook.send(content=content, username=display_name)
        except Exception as e:
            print(f"⚠️ Error sending as {persona}: {e}")
 # ============================================================================
 # CONVENIENCE FUNCTIONS
 # ============================================================================
 # Singleton instances
 _scorer = None
 _dialogue_manager = None
 def get_interjection_scorer() -> InterjectionScorer:
    """Get the singleton InterjectionScorer instance"""
    global _scorer
    if _scorer is None:
        _scorer = InterjectionScorer()
    return _scorer
 def get_dialogue_manager() -> PersonaDialogue:
    """Get the singleton PersonaDialogue instance"""
    global _dialogue_manager
    if _dialogue_manager is None:
        _dialogue_manager = PersonaDialogue()
    return _dialogue_manager
 async def check_for_interjection(message: discord.Message, current_persona: str) -> bool:
    """
    Check if the opposite persona should interject based on a message.
    If they should, starts a dialogue automatically.
    Args:
        message: The Discord message that was just sent
        current_persona: Who sent the message ("miku" or "evil")
    Returns:
        True if an interjection was triggered, False otherwise
    """
    print(f"🔍 [Persona Dialogue] Checking interjection for message from {current_persona}")
    scorer = get_interjection_scorer()
    dialogue_manager = get_dialogue_manager()
    # Don't trigger if dialogue already active
    if dialogue_manager.is_dialogue_active(message.channel.id):
        print(f"⏸️ [Persona Dialogue] Dialogue already active in channel {message.channel.id}")
        return False
    # Check if we should interject
    should_interject, reason, score = await scorer.should_interject(message, current_persona)
    print(f"📊 [Persona Dialogue] Interjection check: should_interject={should_interject}, reason={reason}, score={score:.2f}")
    if should_interject:
        opposite_persona = "evil" if current_persona == "miku" else "miku"
        print(f"🎭 Triggering {opposite_persona} interjection (reason: {reason}, score: {score:.2f})")
        # Start dialogue with the opposite persona responding first
        dialogue_manager.start_dialogue(message.channel.id)
        asyncio.create_task(
            dialogue_manager.handle_dialogue_turn(message.channel, opposite_persona, trigger_reason=reason)
        )
        return True
    return False
 def is_persona_dialogue_active(channel_id: int) -> bool:
    """Check if a persona dialogue is currently active in a channel"""
    dialogue_manager = get_dialogue_manager()
    return dialogue_manager.is_dialogue_active(channel_id)
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -20,6 +20,35 @@ services:
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
  llama-swap-amd:
    build:
      context: .
      dockerfile: Dockerfile.llamaswap-rocm
    container_name: llama-swap-amd
    ports:
      - "8091:8080"  # Map host port 8091 to container port 8080
    volumes:
      - ./models:/models  # GGUF model files
      - ./llama-swap-rocm-config.yaml:/app/config.yaml  # llama-swap configuration for AMD
    devices:
      - /dev/kfd:/dev/kfd
      - /dev/dri:/dev/dri
    group_add:
      - "985"  # video group
      - "989"  # render group
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 10s
      timeout: 5s
      retries: 10
      start_period: 30s  # Give more time for initial model loading
    environment:
      - HSA_OVERRIDE_GFX_VERSION=10.3.0  # RX 6800 compatibility
      - ROCM_PATH=/opt/rocm
      - HIP_VISIBLE_DEVICES=0  # Use first AMD GPU
      - GPU_DEVICE_ORDINAL=0
  miku-bot:
    build: ./bot
    container_name: miku-bot
@@ -30,9 +59,12 @@ services:
    depends_on:
      llama-swap:
        condition: service_healthy
      llama-swap-amd:
        condition: service_healthy
    environment:
      - DISCORD_BOT_TOKEN=MTM0ODAyMjY0Njc3NTc0NjY1MQ.GXsxML.nNCDOplmgNxKgqdgpAomFM2PViX10GjxyuV8uw
      - LLAMA_URL=http://llama-swap:8080
      - LLAMA_AMD_URL=http://llama-swap-amd:8080  # Secondary AMD GPU endpoint
      - TEXT_MODEL=llama3.1
      - VISION_MODEL=vision
      - OWNER_USER_ID=209381657369772032  # Your Discord user ID for DM analysis reports
--- a/llama-swap-rocm-config.yaml
+++ b/llama-swap-rocm-config.yaml
@@ -0,0 +1,25 @@
 # llama-swap configuration for AMD RX 6800 (ROCm)
 # This manages automatic model switching and unloading for the secondary GPU
 # Vision model stays on NVIDIA GPU - AMD only handles text models
 models:
  # Main text generation model (same name as NVIDIA for uniform switching)
  llama3.1:
    cmd: /app/llama-server --port ${PORT} --model /models/Llama-3.1-8B-Instruct-UD-Q4_K_XL.gguf -ngl 99 -nkvo -c 16384 --host 0.0.0.0 --no-warmup
    ttl: 1800  # Unload after 30 minutes of inactivity (1800 seconds)
    aliases:
      - llama3.1
      - text-model
  # Evil/Uncensored model (same name as NVIDIA for uniform switching)
  darkidol:
    cmd: /app/llama-server --port ${PORT} --model /models/DarkIdol-Llama-3.1-8B-Instruct-1.3-Uncensored_Q4_K_M.gguf -ngl 99 -nkvo -c 16384 --host 0.0.0.0 --no-warmup
    ttl: 1800  # Unload after 30 minutes of inactivity
    aliases:
      - darkidol
      - evil-model
      - uncensored
 # Server configuration
 # llama-swap will listen on this address
 # Inside Docker, we bind to 0.0.0.0 to allow bot container to connect
--- a/setup-dual-gpu.sh
+++ b/setup-dual-gpu.sh
@@ -0,0 +1,159 @@
 #!/bin/bash
 # Setup script for dual GPU configuration (NVIDIA + AMD RX 6800)
 # Note: Not using 'set -e' to allow arithmetic operations
 echo "================================================"
 echo "Miku Discord Bot - Dual GPU Setup"
 echo "================================================"
 echo ""
 # Colors for output
 RED='\033[0;31m'
 GREEN='\033[0;32m'
 YELLOW='\033[1;33m'
 NC='\033[0m' # No Color
 # Check if ROCm is installed on host (optional - Docker container has it)
 echo -e "${YELLOW}Checking ROCm installation (optional)...${NC}"
 if command -v rocm-smi &> /dev/null; then
    echo -e "${GREEN}✓ ROCm is installed on host${NC}"
    rocm-smi --version
    # Check AMD GPU via rocm-smi
    echo ""
    echo -e "${YELLOW}Checking AMD GPU...${NC}"
    if rocm-smi -i 0 &> /dev/null; then
        echo -e "${GREEN}✓ AMD GPU detected via rocm-smi${NC}"
        rocm-smi -i 0 --showproductname
    else
        echo -e "${YELLOW}! AMD GPU check via rocm-smi failed${NC}"
    fi
 else
    echo -e "${YELLOW}! ROCm not found on host (this is OK - Docker container includes ROCm)${NC}"
    # Fallback: Check AMD GPU via lspci
    echo ""
    echo -e "${YELLOW}Checking AMD GPU via lspci...${NC}"
    if lspci | grep -i "VGA.*AMD\|VGA.*Radeon" &> /dev/null; then
        echo -e "${GREEN}✓ AMD GPU detected${NC}"
        lspci | grep -i "VGA.*AMD\|VGA.*Radeon"
    else
        echo -e "${YELLOW}! AMD GPU not detected via lspci${NC}"
        echo "Note: AMD RX 6800 should be visible in lspci output"
    fi
 fi
 # Check NVIDIA GPU
 echo ""
 echo -e "${YELLOW}Checking NVIDIA GPU...${NC}"
 if command -v nvidia-smi &> /dev/null; then
    echo -e "${GREEN}✓ NVIDIA GPU detected${NC}"
    nvidia-smi --query-gpu=name --format=csv,noheader
 else
    echo -e "${RED}✗ NVIDIA GPU not found${NC}"
    echo "Warning: Primary GPU (NVIDIA) not detected"
 fi
 # Check Docker
 echo ""
 echo -e "${YELLOW}Checking Docker...${NC}"
 if command -v docker &> /dev/null; then
    echo -e "${GREEN}✓ Docker is installed${NC}"
    docker --version
 else
    echo -e "${RED}✗ Docker not found${NC}"
    exit 1
 fi
 # Check device permissions
 echo ""
 echo -e "${YELLOW}Checking device permissions...${NC}"
 if [ -c /dev/kfd ] && [ -r /dev/kfd ]; then
    echo -e "${GREEN}✓ /dev/kfd accessible${NC}"
 else
    echo -e "${RED}✗ /dev/kfd not accessible${NC}"
    echo "You may need to add your user to the 'render' group:"
    echo "  sudo usermod -aG render \$USER"
 fi
 if [ -d /dev/dri ]; then
    echo -e "${GREEN}✓ /dev/dri exists${NC}"
 else
    echo -e "${RED}✗ /dev/dri not found${NC}"
 fi
 # Check if models exist
 echo ""
 echo -e "${YELLOW}Checking model files...${NC}"
 MODEL_DIR="./models"
 if [ -d "$MODEL_DIR" ]; then
    echo -e "${GREEN}✓ Models directory exists${NC}"
    # Check for specific models
    models_found=0
    if [ -f "$MODEL_DIR/Llama-3.1-8B-Instruct-UD-Q4_K_XL.gguf" ]; then
        echo -e "${GREEN}  ✓ Llama 3.1 8B model found${NC}"
        ((models_found++))
    fi
    if [ -f "$MODEL_DIR/DarkIdol-Llama-3.1-8B-Instruct-1.3-Uncensored_Q4_K_M.gguf" ]; then
        echo -e "${GREEN}  ✓ DarkIdol model found${NC}"
        ((models_found++))
    fi
    if [ -f "$MODEL_DIR/moondream2-text-model-f16_ct-vicuna.gguf" ]; then
        echo -e "${GREEN}  ✓ Moondream2 model found${NC}"
        ((models_found++))
    fi
    if [ $models_found -eq 0 ]; then
        echo -e "${YELLOW}  ! No models found in $MODEL_DIR${NC}"
        echo "  Please download GGUF models before starting"
    fi
 else
    echo -e "${YELLOW}! Models directory not found${NC}"
    echo "Creating models directory..."
    mkdir -p "$MODEL_DIR"
 fi
 echo ""
 echo "================================================"
 echo "Setup Commands"
 echo "================================================"
 echo ""
 echo "1. Build the AMD ROCm container:"
 echo -e "   ${GREEN}docker compose build llama-swap-amd${NC}"
 echo ""
 echo "2. Start both GPU services:"
 echo -e "   ${GREEN}docker compose up -d llama-swap llama-swap-amd${NC}"
 echo ""
 echo "3. Check status:"
 echo -e "   ${GREEN}docker compose ps${NC}"
 echo -e "   ${GREEN}docker compose logs -f llama-swap-amd${NC}"
 echo ""
 echo "4. Test endpoints:"
 echo -e "   ${GREEN}curl http://localhost:8090/health  # NVIDIA GPU${NC}"
 echo -e "   ${GREEN}curl http://localhost:8091/health  # AMD GPU${NC}"
 echo ""
 echo "5. Monitor GPU usage:"
 echo -e "   ${GREEN}watch -n 1 nvidia-smi              # NVIDIA${NC}"
 echo -e "   ${GREEN}watch -n 1 rocm-smi                # AMD${NC}"
 echo ""
 echo "================================================"
 echo "Optional: Enable AMD GPU preference"
 echo "================================================"
 echo ""
 echo "To prefer AMD GPU for load balancing, add to docker-compose.yml:"
 echo "  environment:"
 echo "    - PREFER_AMD_GPU=true"
 echo ""
 echo -e "${GREEN}Setup check complete!${NC}"
 echo "See DUAL_GPU_SETUP.md for detailed documentation"