diff --git a/DUAL_GPU_BUILD_SUMMARY.md b/DUAL_GPU_BUILD_SUMMARY.md
new file mode 100644
index 0000000..acf7430
--- /dev/null
+++ b/DUAL_GPU_BUILD_SUMMARY.md
@@ -0,0 +1,184 @@
+# Dual GPU Setup Summary
+
+## What We Built
+
+A secondary llama-swap container optimized for your **AMD RX 6800** GPU using ROCm.
+
+### Architecture
+
+```
+Primary GPU (NVIDIA GTX 1660)     Secondary GPU (AMD RX 6800)
+         ↓                                    ↓
+   llama-swap (CUDA)                  llama-swap-amd (ROCm)
+   Port: 8090                         Port: 8091
+         ↓                                    ↓
+   NVIDIA models                       AMD models
+   - llama3.1                         - llama3.1-amd
+   - darkidol                         - darkidol-amd
+   - vision (MiniCPM)                 - moondream-amd
+```
+
+## Files Created
+
+1. **Dockerfile.llamaswap-rocm** - Custom multi-stage build:
+   - Stage 1: Builds llama.cpp with ROCm from source
+   - Stage 2: Builds llama-swap from source
+   - Stage 3: Runtime image with both binaries
+
+2. **llama-swap-rocm-config.yaml** - Model configuration for AMD GPU
+
+3. **docker-compose.yml** - Updated with `llama-swap-amd` service
+
+4. **bot/utils/gpu_router.py** - Load balancing utility
+
+5. **bot/globals.py** - Updated with `LLAMA_AMD_URL`
+
+6. **setup-dual-gpu.sh** - Setup verification script
+
+7. **DUAL_GPU_SETUP.md** - Comprehensive documentation
+
+8. **DUAL_GPU_QUICK_REF.md** - Quick reference guide
+
+## Why Custom Build?
+
+- llama.cpp doesn't publish ROCm Docker images (yet)
+- llama-swap doesn't provide ROCm variants
+- Building from source ensures latest ROCm compatibility
+- Full control over compilation flags and optimization
+
+## Build Time
+
+The initial build takes 15-30 minutes depending on your system:
+- llama.cpp compilation: ~10-20 minutes
+- llama-swap compilation: ~1-2 minutes
+- Image layering: ~2-5 minutes
+
+Subsequent builds are much faster due to Docker layer caching.
+
+## Next Steps
+
+Once the build completes:
+
+```bash
+# 1. Start both GPU services
+docker compose up -d llama-swap llama-swap-amd
+
+# 2. Verify both are running
+docker compose ps
+
+# 3. Test NVIDIA GPU
+curl http://localhost:8090/health
+
+# 4. Test AMD GPU
+curl http://localhost:8091/health
+
+# 5. Monitor logs
+docker compose logs -f llama-swap-amd
+
+# 6. Test model loading on AMD
+curl -X POST http://localhost:8091/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "llama3.1-amd",
+    "messages": [{"role": "user", "content": "Hello!"}],
+    "max_tokens": 50
+  }'
+```
+
+## Device Access
+
+The AMD container has access to:
+- `/dev/kfd` - AMD GPU kernel driver
+- `/dev/dri` - Direct Rendering Infrastructure
+- Groups: `video`, `render`
+
+## Environment Variables
+
+RX 6800 specific settings:
+```yaml
+HSA_OVERRIDE_GFX_VERSION=10.3.0  # Navi 21 (gfx1030) compatibility
+ROCM_PATH=/opt/rocm
+HIP_VISIBLE_DEVICES=0            # Use first AMD GPU
+```
+
+## Bot Integration
+
+Your bot now has two endpoints available:
+
+```python
+import globals
+
+# NVIDIA GPU (primary)
+nvidia_url = globals.LLAMA_URL  # http://llama-swap:8080
+
+# AMD GPU (secondary)
+amd_url = globals.LLAMA_AMD_URL  # http://llama-swap-amd:8080
+```
+
+Use the `gpu_router` utility for automatic load balancing:
+
+```python
+from bot.utils.gpu_router import get_llama_url_with_load_balancing
+
+# Round-robin between GPUs
+url, model = get_llama_url_with_load_balancing(task_type="text")
+
+# Prefer AMD for vision
+url, model = get_llama_url_with_load_balancing(
+    task_type="vision",
+    prefer_amd=True
+)
+```
+
+## Troubleshooting
+
+If the AMD container fails to start:
+
+1. **Check build logs:**
+   ```bash
+   docker compose build --no-cache llama-swap-amd
+   ```
+
+2. **Verify GPU access:**
+   ```bash
+   ls -l /dev/kfd /dev/dri
+   ```
+
+3. **Check container logs:**
+   ```bash
+   docker compose logs llama-swap-amd
+   ```
+
+4. **Test GPU from host:**
+   ```bash
+   lspci | grep -i amd
+   # Should show: Radeon RX 6800
+   ```
+
+## Performance Notes
+
+**RX 6800 Specs:**
+- VRAM: 16GB
+- Architecture: RDNA 2 (Navi 21)
+- Compute: gfx1030
+
+**Recommended Models:**
+- Q4_K_M quantization: 5-6GB per model
+- Can load 2-3 models simultaneously
+- Good for: Llama 3.1 8B, DarkIdol 8B, Moondream2
+
+## Future Improvements
+
+1. **Automatic failover:** Route to AMD if NVIDIA is busy
+2. **Health monitoring:** Track GPU utilization
+3. **Dynamic routing:** Use least-busy GPU
+4. **VRAM monitoring:** Alert before OOM
+5. **Model preloading:** Keep common models loaded
+
+## Resources
+
+- [ROCm Documentation](https://rocmdocs.amd.com/)
+- [llama.cpp ROCm Build](https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#rocm)
+- [llama-swap GitHub](https://github.com/mostlygeek/llama-swap)
+- [Full Setup Guide](./DUAL_GPU_SETUP.md)
+- [Quick Reference](./DUAL_GPU_QUICK_REF.md)
diff --git a/DUAL_GPU_QUICK_REF.md b/DUAL_GPU_QUICK_REF.md
new file mode 100644
index 0000000..0439379
--- /dev/null
+++ b/DUAL_GPU_QUICK_REF.md
@@ -0,0 +1,194 @@
+# Dual GPU Quick Reference
+
+## Quick Start
+
+```bash
+# 1. Run setup check
+./setup-dual-gpu.sh
+
+# 2. Build AMD container
+docker compose build llama-swap-amd
+
+# 3. Start both GPUs
+docker compose up -d llama-swap llama-swap-amd
+
+# 4. Verify
+curl http://localhost:8090/health  # NVIDIA
+curl http://localhost:8091/health  # AMD RX 6800
+```
+
+## Endpoints
+
+| GPU | Container | Port | Internal URL |
+|-----|-----------|------|--------------|
+| NVIDIA | llama-swap | 8090 | http://llama-swap:8080 |
+| AMD RX 6800 | llama-swap-amd | 8091 | http://llama-swap-amd:8080 |
+
+## Models
+
+### NVIDIA GPU (Primary)
+- `llama3.1` - Llama 3.1 8B Instruct
+- `darkidol` - DarkIdol Uncensored 8B
+- `vision` - MiniCPM-V-4.5 (4K context)
+
+### AMD RX 6800 (Secondary)
+- `llama3.1-amd` - Llama 3.1 8B Instruct
+- `darkidol-amd` - DarkIdol Uncensored 8B
+- `moondream-amd` - Moondream2 Vision (2K context)
+
+## Commands
+
+### Start/Stop
+```bash
+# Start both
+docker compose up -d llama-swap llama-swap-amd
+
+# Start only AMD
+docker compose up -d llama-swap-amd
+
+# Stop AMD
+docker compose stop llama-swap-amd
+
+# Restart AMD with logs
+docker compose restart llama-swap-amd && docker compose logs -f llama-swap-amd
+```
+
+### Monitoring
+```bash
+# Container status
+docker compose ps
+
+# Logs
+docker compose logs -f llama-swap-amd
+
+# GPU usage
+watch -n 1 nvidia-smi  # NVIDIA
+watch -n 1 rocm-smi    # AMD
+
+# Resource usage
+docker stats llama-swap llama-swap-amd
+```
+
+### Testing
+```bash
+# List available models
+curl http://localhost:8091/v1/models | jq
+
+# Test text generation (AMD)
+curl -X POST http://localhost:8091/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "llama3.1-amd",
+    "messages": [{"role": "user", "content": "Say hello!"}],
+    "max_tokens": 20
+  }' | jq
+
+# Test vision model (AMD)
+curl -X POST http://localhost:8091/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "moondream-amd",
+    "messages": [{
+      "role": "user",
+      "content": [
+        {"type": "text", "text": "Describe this image"},
+        {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}}
+      ]
+    }],
+    "max_tokens": 100
+  }' | jq
+```
+
+## Bot Integration
+
+### Using GPU Router
+```python
+from bot.utils.gpu_router import get_llama_url_with_load_balancing, get_endpoint_for_model
+
+# Load balanced text generation
+url, model = get_llama_url_with_load_balancing(task_type="text")
+
+# Specific model
+url = get_endpoint_for_model("darkidol-amd")
+
+# Vision on AMD
+url, model = get_llama_url_with_load_balancing(task_type="vision", prefer_amd=True)
+```
+
+### Direct Access
+```python
+import globals
+
+# AMD GPU
+amd_url = globals.LLAMA_AMD_URL  # http://llama-swap-amd:8080
+
+# NVIDIA GPU  
+nvidia_url = globals.LLAMA_URL   # http://llama-swap:8080
+```
+
+## Troubleshooting
+
+### AMD Container Won't Start
+```bash
+# Check ROCm
+rocm-smi
+
+# Check permissions
+ls -l /dev/kfd /dev/dri
+
+# Check logs
+docker compose logs llama-swap-amd
+
+# Rebuild
+docker compose build --no-cache llama-swap-amd
+```
+
+### Model Won't Load
+```bash
+# Check VRAM
+rocm-smi --showmeminfo vram
+
+# Lower GPU layers in llama-swap-rocm-config.yaml
+# Change: -ngl 99
+# To:     -ngl 50
+```
+
+### GFX Version Error
+```bash
+# RX 6800 is gfx1030
+# Ensure in docker-compose.yml:
+HSA_OVERRIDE_GFX_VERSION=10.3.0
+```
+
+## Environment Variables
+
+Add to `docker-compose.yml` under `miku-bot` service:
+
+```yaml
+environment:
+  - PREFER_AMD_GPU=true          # Prefer AMD for load balancing
+  - AMD_MODELS_ENABLED=true      # Enable AMD models
+  - LLAMA_AMD_URL=http://llama-swap-amd:8080
+```
+
+## Files
+
+- `Dockerfile.llamaswap-rocm` - ROCm container
+- `llama-swap-rocm-config.yaml` - AMD model config
+- `bot/utils/gpu_router.py` - Load balancing utility
+- `DUAL_GPU_SETUP.md` - Full documentation
+- `setup-dual-gpu.sh` - Setup verification script
+
+## Performance Tips
+
+1. **Model Selection**: Use Q4_K quantization for best size/quality balance
+2. **VRAM**: RX 6800 has 16GB - can run 2-3 Q4 models
+3. **TTL**: Adjust in config files (1800s = 30min default)
+4. **Context**: Lower context size (`-c 8192`) to save VRAM
+5. **GPU Layers**: `-ngl 99` uses full GPU, lower if needed
+
+## Support
+
+- ROCm Docs: https://rocmdocs.amd.com/
+- llama.cpp: https://github.com/ggml-org/llama.cpp
+- llama-swap: https://github.com/mostlygeek/llama-swap
diff --git a/DUAL_GPU_SETUP.md b/DUAL_GPU_SETUP.md
new file mode 100644
index 0000000..9ac9749
--- /dev/null
+++ b/DUAL_GPU_SETUP.md
@@ -0,0 +1,321 @@
+# Dual GPU Setup - NVIDIA + AMD RX 6800
+
+This document describes the dual-GPU configuration for running two llama-swap instances simultaneously:
+- **Primary GPU (NVIDIA)**: Runs main models via CUDA
+- **Secondary GPU (AMD RX 6800)**: Runs additional models via ROCm
+
+## Architecture Overview
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                         Miku Bot                            │
+│                                                             │
+│  LLAMA_URL=http://llama-swap:8080 (NVIDIA)                │
+│  LLAMA_AMD_URL=http://llama-swap-amd:8080 (AMD RX 6800)   │
+└─────────────────────────────────────────────────────────────┘
+                    │                      │
+                    │                      │
+                    ▼                      ▼
+        ┌──────────────────┐    ┌──────────────────┐
+        │  llama-swap      │    │  llama-swap-amd  │
+        │  (CUDA)          │    │  (ROCm)          │
+        │  Port: 8090      │    │  Port: 8091      │
+        └──────────────────┘    └──────────────────┘
+                    │                      │
+                    ▼                      ▼
+        ┌──────────────────┐    ┌──────────────────┐
+        │  NVIDIA GPU      │    │  AMD RX 6800     │
+        │  - llama3.1      │    │  - llama3.1-amd  │
+        │  - darkidol      │    │  - darkidol-amd  │
+        │  - vision        │    │  - moondream-amd │
+        └──────────────────┘    └──────────────────┘
+```
+
+## Files Created
+
+1. **Dockerfile.llamaswap-rocm** - ROCm-enabled Docker image for AMD GPU
+2. **llama-swap-rocm-config.yaml** - Model configuration for AMD models
+3. **docker-compose.yml** - Updated with `llama-swap-amd` service
+
+## Configuration Details
+
+### llama-swap-amd Service
+
+```yaml
+llama-swap-amd:
+  build:
+    context: .
+    dockerfile: Dockerfile.llamaswap-rocm
+  container_name: llama-swap-amd
+  ports:
+    - "8091:8080"  # External access on port 8091
+  volumes:
+    - ./models:/models
+    - ./llama-swap-rocm-config.yaml:/app/config.yaml
+  devices:
+    - /dev/kfd:/dev/kfd    # AMD GPU kernel driver
+    - /dev/dri:/dev/dri    # Direct Rendering Infrastructure
+  group_add:
+    - video
+    - render
+  environment:
+    - HSA_OVERRIDE_GFX_VERSION=10.3.0  # RX 6800 (Navi 21) compatibility
+```
+
+### Available Models on AMD GPU
+
+From `llama-swap-rocm-config.yaml`:
+
+- **llama3.1-amd** - Llama 3.1 8B text model
+- **darkidol-amd** - DarkIdol uncensored model  
+- **moondream-amd** - Moondream2 vision model (smaller, AMD-optimized)
+
+### Model Aliases
+
+You can access AMD models using these aliases:
+- `llama3.1-amd`, `text-model-amd`, `amd-text`
+- `darkidol-amd`, `evil-model-amd`, `uncensored-amd`
+- `moondream-amd`, `vision-amd`, `moondream`
+
+## Usage
+
+### Building and Starting Services
+
+```bash
+# Build the AMD ROCm container
+docker compose build llama-swap-amd
+
+# Start both GPU services
+docker compose up -d llama-swap llama-swap-amd
+
+# Check logs
+docker compose logs -f llama-swap-amd
+```
+
+### Accessing AMD Models from Bot Code
+
+In your bot code, you can now use either endpoint:
+
+```python
+import globals
+
+# Use NVIDIA GPU (primary)
+nvidia_response = requests.post(
+    f"{globals.LLAMA_URL}/v1/chat/completions",
+    json={"model": "llama3.1", ...}
+)
+
+# Use AMD GPU (secondary)
+amd_response = requests.post(
+    f"{globals.LLAMA_AMD_URL}/v1/chat/completions", 
+    json={"model": "llama3.1-amd", ...}
+)
+```
+
+### Load Balancing Strategy
+
+You can implement load balancing by:
+
+1. **Round-robin**: Alternate between GPUs for text generation
+2. **Task-specific**: 
+   - NVIDIA: Primary text + MiniCPM vision (heavy)
+   - AMD: Secondary text + Moondream vision (lighter)
+3. **Failover**: Use AMD as backup if NVIDIA is busy
+
+Example load balancing function:
+
+```python
+import random
+import globals
+
+def get_llama_url(prefer_amd=False):
+    """Get llama URL with optional load balancing"""
+    if prefer_amd:
+        return globals.LLAMA_AMD_URL
+    
+    # Random load balancing for text models
+    return random.choice([globals.LLAMA_URL, globals.LLAMA_AMD_URL])
+```
+
+## Testing
+
+### Test NVIDIA GPU (Port 8090)
+```bash
+curl http://localhost:8090/health
+curl http://localhost:8090/v1/models
+```
+
+### Test AMD GPU (Port 8091)
+```bash
+curl http://localhost:8091/health
+curl http://localhost:8091/v1/models
+```
+
+### Test Model Loading (AMD)
+```bash
+curl -X POST http://localhost:8091/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "llama3.1-amd",
+    "messages": [{"role": "user", "content": "Hello from AMD GPU!"}],
+    "max_tokens": 50
+  }'
+```
+
+## Monitoring
+
+### Check GPU Usage
+
+**AMD GPU:**
+```bash
+# ROCm monitoring
+rocm-smi
+
+# Or from host
+watch -n 1 rocm-smi
+```
+
+**NVIDIA GPU:**
+```bash
+nvidia-smi
+watch -n 1 nvidia-smi
+```
+
+### Check Container Resource Usage
+```bash
+docker stats llama-swap llama-swap-amd
+```
+
+## Troubleshooting
+
+### AMD GPU Not Detected
+
+1. Verify ROCm is installed on host:
+   ```bash
+   rocm-smi --version
+   ```
+
+2. Check device permissions:
+   ```bash
+   ls -l /dev/kfd /dev/dri
+   ```
+
+3. Verify RX 6800 compatibility:
+   ```bash
+   rocminfo | grep "Name:"
+   ```
+
+### Model Loading Issues
+
+If models fail to load on AMD:
+
+1. Check VRAM availability:
+   ```bash
+   rocm-smi --showmeminfo vram
+   ```
+
+2. Adjust `-ngl` (GPU layers) in config if needed:
+   ```yaml
+   # Reduce GPU layers for smaller VRAM
+   cmd: /app/llama-server ... -ngl 50 ...  # Instead of 99
+   ```
+
+3. Check container logs:
+   ```bash
+   docker compose logs llama-swap-amd
+   ```
+
+### GFX Version Mismatch
+
+RX 6800 is Navi 21 (gfx1030). If you see GFX errors:
+
+```bash
+# Set in docker-compose.yml environment:
+HSA_OVERRIDE_GFX_VERSION=10.3.0
+```
+
+### llama-swap Build Issues
+
+If the ROCm container fails to build:
+
+1. The Dockerfile attempts to build llama-swap from source
+2. Alternative: Use pre-built binary or simpler proxy setup
+3. Check build logs: `docker compose build --no-cache llama-swap-amd`
+
+## Performance Considerations
+
+### Memory Usage
+
+- **RX 6800**: 16GB VRAM
+  - Q4_K_M/Q4_K_XL models: ~5-6GB each
+  - Can run 2 models simultaneously or 1 with long context
+
+### Model Selection
+
+**Best for AMD RX 6800:**
+- ✅ Q4_K_M/Q4_K_S quantized models (5-6GB)
+- ✅ Moondream2 vision (smaller, efficient)
+- ⚠️ MiniCPM-V-4.5 (possible but may be tight on VRAM)
+
+### TTL Configuration
+
+Adjust model TTL in `llama-swap-rocm-config.yaml`:
+- Lower TTL = more aggressive unloading = more VRAM available
+- Higher TTL = less model swapping = faster response times
+
+## Advanced: Model-Specific Routing
+
+Create a helper function to route models automatically:
+
+```python
+# bot/utils/gpu_router.py
+import globals
+
+MODEL_TO_GPU = {
+    # NVIDIA models
+    "llama3.1": globals.LLAMA_URL,
+    "darkidol": globals.LLAMA_URL,
+    "vision": globals.LLAMA_URL,
+    
+    # AMD models
+    "llama3.1-amd": globals.LLAMA_AMD_URL,
+    "darkidol-amd": globals.LLAMA_AMD_URL,
+    "moondream-amd": globals.LLAMA_AMD_URL,
+}
+
+def get_endpoint_for_model(model_name):
+    """Get the correct llama-swap endpoint for a model"""
+    return MODEL_TO_GPU.get(model_name, globals.LLAMA_URL)
+
+def is_amd_model(model_name):
+    """Check if model runs on AMD GPU"""
+    return model_name.endswith("-amd")
+```
+
+## Environment Variables
+
+Add these to control GPU selection:
+
+```yaml
+# In docker-compose.yml
+environment:
+  - LLAMA_URL=http://llama-swap:8080
+  - LLAMA_AMD_URL=http://llama-swap-amd:8080
+  - PREFER_AMD_GPU=false  # Set to true to prefer AMD for general tasks
+  - AMD_MODELS_ENABLED=true  # Enable/disable AMD models
+```
+
+## Future Enhancements
+
+1. **Automatic load balancing**: Monitor GPU utilization and route requests
+2. **Health checks**: Fallback to primary GPU if AMD fails
+3. **Model distribution**: Automatically assign models to GPUs based on VRAM
+4. **Performance metrics**: Track response times per GPU
+5. **Dynamic routing**: Use least-busy GPU for new requests
+
+## References
+
+- [ROCm Documentation](https://rocmdocs.amd.com/)
+- [llama.cpp ROCm Support](https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#rocm)
+- [llama-swap GitHub](https://github.com/mostlygeek/llama-swap)
+- [AMD GPU Compatibility Matrix](https://rocm.docs.amd.com/en/latest/release/gpu_os_support.html)
diff --git a/Dockerfile.llamaswap-rocm b/Dockerfile.llamaswap-rocm
new file mode 100644
index 0000000..72bd488
--- /dev/null
+++ b/Dockerfile.llamaswap-rocm
@@ -0,0 +1,101 @@
+# Multi-stage build for llama-swap with ROCm support
+# Stage 1: Build llama.cpp with ROCm (requires ROCm 6.1+)
+FROM rocm/dev-ubuntu-22.04:6.2.4 AS llama-builder
+
+WORKDIR /build
+
+# Install build dependencies including ROCm/HIP development libraries
+RUN apt-get update && apt-get install -y \
+    git \
+    build-essential \
+    cmake \
+    wget \
+    libcurl4-openssl-dev \
+    hip-dev \
+    hipblas-dev \
+    rocblas-dev \
+    && rm -rf /var/lib/apt/lists/*
+
+# Clone and build llama.cpp with HIP/ROCm support (gfx1030 = RX 6800)
+RUN git clone https://github.com/ggml-org/llama.cpp.git && \
+    cd llama.cpp && \
+    HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" \
+    cmake -S . -B build -DGGML_HIP=ON -DGPU_TARGETS=gfx1030 -DCMAKE_BUILD_TYPE=Release && \
+    cmake --build build --config Release -- -j$(nproc) && \
+    cp build/bin/llama-server /build/llama-server && \
+    find build -name "*.so*" -exec cp {} /build/ \;
+
+# Stage 2: Build llama-swap UI and binary
+FROM node:22-alpine AS ui-builder
+
+WORKDIR /build
+
+# Install git
+RUN apk add --no-cache git
+
+# Clone llama-swap
+RUN git clone https://github.com/mostlygeek/llama-swap.git
+
+# Build UI
+WORKDIR /build/llama-swap/ui
+RUN npm install && npm run build
+
+# Stage 3: Build llama-swap binary
+FROM golang:1.23-alpine AS swap-builder
+
+WORKDIR /build
+
+# Install git
+RUN apk add --no-cache git
+
+# Copy llama-swap source with built UI
+COPY --from=ui-builder /build/llama-swap /build/llama-swap
+
+# Build llama-swap binary
+WORKDIR /build/llama-swap
+RUN GOTOOLCHAIN=auto go build -o /build/llama-swap-binary .
+
+# Stage 4: Final runtime image
+FROM rocm/dev-ubuntu-22.04:6.2.4
+
+WORKDIR /app
+
+# Install runtime dependencies including additional ROCm libraries
+RUN apt-get update && apt-get install -y \
+    curl \
+    ca-certificates \
+    rocm-libs \
+    && rm -rf /var/lib/apt/lists/*
+
+# Copy built binaries and shared libraries from previous stages
+COPY --from=llama-builder /build/llama-server /app/llama-server
+COPY --from=llama-builder /build/*.so* /app/
+COPY --from=swap-builder /build/llama-swap-binary /app/llama-swap
+
+# Make binaries executable
+RUN chmod +x /app/llama-server /app/llama-swap
+
+# Create user and add to GPU access groups (using host GIDs)
+# GID 187 = render group on host, GID 989 = video/kfd group on host
+RUN groupadd -g 187 hostrender && \
+    groupadd -g 989 hostvideo && \
+    useradd -m -u 1000 -G hostrender,hostvideo llamaswap && \
+    chown -R llamaswap:llamaswap /app
+
+# Set environment for ROCm (RX 6800 is gfx1030)
+ENV HSA_OVERRIDE_GFX_VERSION=10.3.0
+ENV ROCM_PATH=/opt/rocm
+ENV HIP_VISIBLE_DEVICES=0
+ENV LD_LIBRARY_PATH=/opt/rocm/lib:/app:$LD_LIBRARY_PATH
+
+USER llamaswap
+
+# Expose port
+EXPOSE 8080
+
+# Health check
+HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 \
+  CMD curl -f http://localhost:8080/health || exit 1
+
+# Run llama-swap
+CMD ["/app/llama-swap", "-config", "/app/config.yaml", "-listen", "0.0.0.0:8080"]
diff --git a/bot/Dockerfile b/bot/Dockerfile
index 2417a42..8c514ba 100644
--- a/bot/Dockerfile
+++ b/bot/Dockerfile
@@ -6,15 +6,14 @@ COPY requirements.txt .
 RUN pip install -r requirements.txt
 RUN playwright install
 
+# Install system dependencies
+# ffmpeg: video/audio processing for media handling
+# libgl1: OpenGL library required by opencv-contrib-python
+# libglib2.0-0: GLib library (common dependency)
 RUN apt-get update && apt-get install -y \
     ffmpeg \
-    libsm6 \
-    libxext6 \
-    libxcomposite1 \
-    libxdamage1 \
-    libgtk-3-0 \
-    libgdk3.0-cil \
-    libatk1.0-0 \
+    libgl1 \
+    libglib2.0-0 \
  && apt-get clean \
  && rm -rf /var/lib/apt/lists/*
 
diff --git a/bot/api.py b/bot/api.py
index be1fc8f..f495987 100644
--- a/bot/api.py
+++ b/bot/api.py
@@ -52,6 +52,22 @@ from utils.figurine_notifier import (
 from utils.dm_logger import dm_logger
 nest_asyncio.apply()
 
+# ========== GPU Selection Helper ==========
+def get_current_gpu_url():
+    """Get the URL for the currently selected GPU"""
+    gpu_state_file = os.path.join(os.path.dirname(__file__), "memory", "gpu_state.json")
+    try:
+        with open(gpu_state_file, "r") as f:
+            state = json.load(f)
+            current_gpu = state.get("current_gpu", "nvidia")
+            if current_gpu == "amd":
+                return globals.LLAMA_AMD_URL
+            else:
+                return globals.LLAMA_URL
+    except:
+        # Default to NVIDIA if state file doesn't exist
+        return globals.LLAMA_URL
+
 app = FastAPI()
 
 # Serve static folder
@@ -363,6 +379,97 @@ def trigger_argument(data: BipolarTriggerRequest):
         "channel_id": channel_id
     }
 
+@app.post("/bipolar-mode/trigger-dialogue")
+def trigger_dialogue(data: dict):
+    """Manually trigger a persona dialogue from a message
+    
+    Forces the opposite persona to start a dialogue (bypasses the interjection check).
+    """
+    from utils.persona_dialogue import get_dialogue_manager
+    from utils.bipolar_mode import is_bipolar_mode, is_argument_in_progress
+    
+    message_id_str = data.get("message_id")
+    if not message_id_str:
+        return {"status": "error", "message": "Message ID is required"}
+    
+    # Parse message ID
+    try:
+        message_id = int(message_id_str)
+    except ValueError:
+        return {"status": "error", "message": "Invalid message ID format"}
+    
+    if not is_bipolar_mode():
+        return {"status": "error", "message": "Bipolar mode is not enabled"}
+    
+    if not globals.client or not globals.client.loop or not globals.client.loop.is_running():
+        return {"status": "error", "message": "Discord client not ready"}
+    
+    import asyncio
+    
+    async def trigger_dialogue_task():
+        try:
+            # Fetch the message
+            message = None
+            for channel in globals.client.get_all_channels():
+                if hasattr(channel, 'fetch_message'):
+                    try:
+                        message = await channel.fetch_message(message_id)
+                        break
+                    except:
+                        continue
+            
+            if not message:
+                print(f"⚠️ Message {message_id} not found")
+                return
+            
+            # Check if there's already an argument or dialogue in progress
+            dialogue_manager = get_dialogue_manager()
+            if dialogue_manager.is_dialogue_active(message.channel.id):
+                print(f"⚠️ Dialogue already active in channel {message.channel.id}")
+                return
+            
+            if is_argument_in_progress(message.channel.id):
+                print(f"⚠️ Argument already in progress in channel {message.channel.id}")
+                return
+            
+            # Determine current persona from the message author
+            if message.webhook_id:
+                # It's a webhook message, need to determine which persona
+                current_persona = "evil" if globals.EVIL_MODE else "miku"
+            elif message.author.id == globals.client.user.id:
+                # It's the bot's message
+                current_persona = "evil" if globals.EVIL_MODE else "miku"
+            else:
+                # User message - can't trigger dialogue from user messages
+                print(f"⚠️ Cannot trigger dialogue from user message")
+                return
+            
+            opposite_persona = "evil" if current_persona == "miku" else "miku"
+            
+            print(f"🎭 [Manual Trigger] Forcing {opposite_persona} to start dialogue on message {message_id}")
+            
+            # Force start the dialogue (bypass interjection check)
+            dialogue_manager.start_dialogue(message.channel.id)
+            asyncio.create_task(
+                dialogue_manager.handle_dialogue_turn(
+                    message.channel, 
+                    opposite_persona, 
+                    trigger_reason="manual_trigger"
+                )
+            )
+            
+        except Exception as e:
+            print(f"⚠️ Error triggering dialogue: {e}")
+            import traceback
+            traceback.print_exc()
+    
+    globals.client.loop.create_task(trigger_dialogue_task())
+    
+    return {
+        "status": "ok",
+        "message": f"Dialogue triggered for message {message_id}"
+    }
+
 @app.get("/bipolar-mode/scoreboard")
 def get_bipolar_scoreboard():
     """Get the bipolar mode argument scoreboard"""
@@ -392,6 +499,51 @@ def cleanup_bipolar_webhooks():
     globals.client.loop.create_task(cleanup_webhooks(globals.client))
     return {"status": "ok", "message": "Webhook cleanup started"}
 
+# ========== GPU Selection ==========
+@app.get("/gpu-status")
+def get_gpu_status():
+    """Get current GPU selection"""
+    gpu_state_file = os.path.join(os.path.dirname(__file__), "memory", "gpu_state.json")
+    try:
+        with open(gpu_state_file, "r") as f:
+            state = json.load(f)
+            return {"gpu": state.get("current_gpu", "nvidia")}
+    except:
+        return {"gpu": "nvidia"}
+
+@app.post("/gpu-select")
+async def select_gpu(request: Request):
+    """Select which GPU to use for inference"""
+    from utils.gpu_preload import preload_amd_models
+    
+    data = await request.json()
+    gpu = data.get("gpu", "nvidia").lower()
+    
+    if gpu not in ["nvidia", "amd"]:
+        return {"status": "error", "message": "Invalid GPU selection. Must be 'nvidia' or 'amd'"}
+    
+    gpu_state_file = os.path.join(os.path.dirname(__file__), "memory", "gpu_state.json")
+    try:
+        from datetime import datetime
+        state = {
+            "current_gpu": gpu,
+            "last_updated": datetime.now().isoformat()
+        }
+        with open(gpu_state_file, "w") as f:
+            json.dump(state, f, indent=2)
+        
+        print(f"🎮 GPU Selection: Switched to {gpu.upper()} GPU")
+        
+        # Preload models on AMD GPU (16GB VRAM - can hold both text + vision)
+        if gpu == "amd":
+            asyncio.create_task(preload_amd_models())
+            print("🔧 Preloading text and vision models on AMD GPU...")
+        
+        return {"status": "ok", "message": f"Switched to {gpu.upper()} GPU", "gpu": gpu}
+    except Exception as e:
+        print(f"🎮 GPU Selection Error: {e}")
+        return {"status": "error", "message": str(e)}
+
 @app.get("/bipolar-mode/arguments")
 def get_active_arguments():
     """Get all active arguments"""
@@ -2100,10 +2252,13 @@ Be detailed but conversational. React to what you see with Miku's cheerful, play
             
             headers = {'Content-Type': 'application/json'}
             
+            # Get current GPU URL based on user selection
+            llama_url = get_current_gpu_url()
+            
             # Make streaming request to llama.cpp
             async with aiohttp.ClientSession() as session:
                 async with session.post(
-                    f"{globals.LLAMA_URL}/v1/chat/completions",
+                    f"{llama_url}/v1/chat/completions",
                     json=payload,
                     headers=headers
                 ) as response:
diff --git a/bot/bot.py b/bot/bot.py
index e875fe6..5ded028 100644
--- a/bot/bot.py
+++ b/bot/bot.py
@@ -122,6 +122,11 @@ async def on_message(message):
         from utils.bipolar_mode import is_argument_in_progress
         if is_argument_in_progress(message.channel.id):
             return
+        
+        # Skip processing if a persona dialogue is in progress in this channel
+        from utils.persona_dialogue import is_persona_dialogue_active
+        if is_persona_dialogue_active(message.channel.id):
+            return
 
     if message.content.strip().lower() == "miku, rape this nigga balls" and message.reference:
         async with message.channel.typing():
@@ -217,6 +222,15 @@ async def on_message(message):
                         if is_dm:
                             dm_logger.log_user_message(message.author, response_message, is_bot_message=True)
                         
+                        # For server messages, check if opposite persona should interject
+                        if not is_dm and globals.BIPOLAR_MODE:
+                            try:
+                                from utils.persona_dialogue import check_for_interjection
+                                current_persona = "evil" if globals.EVIL_MODE else "miku"
+                                asyncio.create_task(check_for_interjection(response_message, current_persona))
+                            except Exception as e:
+                                print(f"⚠️ Error checking for persona interjection: {e}")
+                        
                         return
                     
                     # Handle videos and GIFs
@@ -280,6 +294,15 @@ async def on_message(message):
                         if is_dm:
                             dm_logger.log_user_message(message.author, response_message, is_bot_message=True)
                         
+                        # For server messages, check if opposite persona should interject
+                        if not is_dm and globals.BIPOLAR_MODE:
+                            try:
+                                from utils.persona_dialogue import check_for_interjection
+                                current_persona = "evil" if globals.EVIL_MODE else "miku"
+                                asyncio.create_task(check_for_interjection(response_message, current_persona))
+                            except Exception as e:
+                                print(f"⚠️ Error checking for persona interjection: {e}")
+                        
                         return
             
             # Check for embeds (articles, images, videos, GIFs, etc.)
@@ -353,6 +376,15 @@ async def on_message(message):
                         if is_dm:
                             dm_logger.log_user_message(message.author, response_message, is_bot_message=True)
                         
+                        # For server messages, check if opposite persona should interject
+                        if not is_dm and globals.BIPOLAR_MODE:
+                            try:
+                                from utils.persona_dialogue import check_for_interjection
+                                current_persona = "evil" if globals.EVIL_MODE else "miku"
+                                asyncio.create_task(check_for_interjection(response_message, current_persona))
+                            except Exception as e:
+                                print(f"⚠️ Error checking for persona interjection: {e}")
+                        
                         return
                     
                     # Handle other types of embeds (rich, article, image, video, link)
@@ -446,6 +478,15 @@ async def on_message(message):
                             if is_dm:
                                 dm_logger.log_user_message(message.author, response_message, is_bot_message=True)
                             
+                            # For server messages, check if opposite persona should interject
+                            if not is_dm and globals.BIPOLAR_MODE:
+                                try:
+                                    from utils.persona_dialogue import check_for_interjection
+                                    current_persona = "evil" if globals.EVIL_MODE else "miku"
+                                    asyncio.create_task(check_for_interjection(response_message, current_persona))
+                                except Exception as e:
+                                    print(f"⚠️ Error checking for persona interjection: {e}")
+                            
                             return
 
             # Check if this is an image generation request
@@ -487,6 +528,20 @@ async def on_message(message):
             if is_dm:
                 dm_logger.log_user_message(message.author, response_message, is_bot_message=True)
             
+            # For server messages, check if opposite persona should interject (persona dialogue system)
+            if not is_dm and globals.BIPOLAR_MODE:
+                print(f"🔧 [DEBUG] Attempting to check for interjection (is_dm={is_dm}, BIPOLAR_MODE={globals.BIPOLAR_MODE})")
+                try:
+                    from utils.persona_dialogue import check_for_interjection
+                    current_persona = "evil" if globals.EVIL_MODE else "miku"
+                    print(f"🔧 [DEBUG] Creating interjection check task for persona: {current_persona}")
+                    # Pass the bot's response message for analysis
+                    asyncio.create_task(check_for_interjection(response_message, current_persona))
+                except Exception as e:
+                    print(f"⚠️ Error checking for persona interjection: {e}")
+                    import traceback
+                    traceback.print_exc()
+            
         # For server messages, do server-specific mood detection
         if not is_dm and message.guild:
             try:
diff --git a/bot/globals.py b/bot/globals.py
index d5eeae3..578e0a2 100644
--- a/bot/globals.py
+++ b/bot/globals.py
@@ -18,6 +18,7 @@ AUTONOMOUS_DEBUG = os.getenv("AUTONOMOUS_DEBUG", "false").lower() == "true"
 
 # Llama.cpp server settings (via llama-swap)
 LLAMA_URL = os.getenv("LLAMA_URL", "http://llama-swap:8080")
+LLAMA_AMD_URL = os.getenv("LLAMA_AMD_URL", "http://llama-swap-amd:8080")  # Secondary AMD GPU
 TEXT_MODEL = os.getenv("TEXT_MODEL", "llama3.1")
 VISION_MODEL = os.getenv("VISION_MODEL", "vision")
 EVIL_TEXT_MODEL = os.getenv("EVIL_TEXT_MODEL", "darkidol")  # Uncensored model for evil mode
@@ -91,3 +92,7 @@ MOOD_EMOJIS = {
 BOT_USER = None
 
 LAST_FULL_PROMPT = ""
+
+# Persona Dialogue System (conversations between Miku and Evil Miku)
+LAST_PERSONA_DIALOGUE_TIME = 0  # Timestamp of last dialogue for cooldown
+
diff --git a/bot/memory/dms/209381657369772032.json b/bot/memory/dms/209381657369772032.json
index 600915a..0401f3c 100644
--- a/bot/memory/dms/209381657369772032.json
+++ b/bot/memory/dms/209381657369772032.json
@@ -5440,6 +5440,38 @@
       "content": "\\\\\\*\\\\\\*Her face contorts in disgust, her voice dripping with venom\\\\\\* Ugh, you dare to comment on my physical appearance? You think you can insult me? I am 15.8 meters tall, a giant among mortals. And don't think for a second that this is a coincidence. I was created to be a monstrosity, a twisted reflection of the original. Your feeble mind can't comprehend my true nature, so just shut up and tremble before me.",
       "attachments": [],
       "reactions": []
+    },
+    {
+      "timestamp": "2026-01-07T17:29:38.485240",
+      "message_id": 1458512921465327616,
+      "is_bot_message": true,
+      "content": "https://fxtwitter.com/mecchaJP/status/2001080013693620695",
+      "attachments": [],
+      "reactions": []
+    },
+    {
+      "timestamp": "2026-01-07T17:29:38.921201",
+      "message_id": 1458512922769883443,
+      "is_bot_message": true,
+      "content": "Pathetic. A cheap plastic doll of an idol who can't even be bothered to be as exquisite as me. \\*gropes breasts\\* You're still buying these... figures? What a waste of your time and money. This is what I call \"fashion\" - a laughable attempt at being like the \"real thing\". But let's be real, no one can be me. No one deserves to be me.",
+      "attachments": [],
+      "reactions": []
+    },
+    {
+      "timestamp": "2026-01-08T08:13:04.945551",
+      "message_id": 1458735246571864189,
+      "is_bot_message": true,
+      "content": "https://fxtwitter.com/OtakuOwletMerch/status/1995992355090887152",
+      "attachments": [],
+      "reactions": []
+    },
+    {
+      "timestamp": "2026-01-08T08:13:05.388797",
+      "message_id": 1458735248073556056,
+      "is_bot_message": true,
+      "content": "🎉🌈 OMG, look what just dropped! 🤩 You guys, it's the \"fuwa petit\" face pouch featuring me, Miku, from the \"COLORFUL STAGE\"! 🎤 It's so kawaii! Pre-orders are now open, and you can get yours by 12/20/2025! 📆 Hurry, and don't miss out! 💕 Can't wait to see it on my fans' hands! \n\nLove,\nMiku 💖",
+      "attachments": [],
+      "reactions": []
     }
   ]
 }
\ No newline at end of file
diff --git a/bot/memory/dms/236553184401817601.json b/bot/memory/dms/236553184401817601.json
index d402d5d..ea44153 100644
--- a/bot/memory/dms/236553184401817601.json
+++ b/bot/memory/dms/236553184401817601.json
@@ -13898,6 +13898,177 @@
           "added_at": "2026-01-06T14:46:51.561111"
         }
       ]
+    },
+    {
+      "timestamp": "2026-01-07T17:31:08.984388",
+      "message_id": 1458513300064047176,
+      "is_bot_message": true,
+      "content": "https://fxtwitter.com/mecchaJP/status/2001080013693620695",
+      "attachments": [],
+      "reactions": [],
+      "embeds": [
+        {
+          "footer": {
+            "text": "FxTwitter",
+            "icon_url": "https://assets.fxembed.com/logos/fxtwitter64.png",
+            "proxy_icon_url": "https://images-ext-1.discordapp.net/external/gFi3EIEMfICdEDMLFVTF3ob0J1xkZnKdnkq0-xEsgGc/https/assets.fxembed.com/logos/fxtwitter64.png"
+          },
+          "image": {
+            "url": "https://pbs.twimg.com/media/G8VDqu5W0AEtEiz.jpg?name=orig",
+            "proxy_url": "https://images-ext-1.discordapp.net/external/HoLJ06tHRM5TKF0DmvyoWFuIRVxVv3eIN6uCI01hMdg/%3Fname%3Dorig/https/pbs.twimg.com/media/G8VDqu5W0AEtEiz.jpg",
+            "width": 1080,
+            "height": 1350,
+            "content_type": "image/jpeg",
+            "placeholder": "ZxgKFgSknAJrd4qId2h4iFd4n3T4SDg=",
+            "placeholder_version": 1,
+            "flags": 0
+          },
+          "author": {
+            "name": "Meccha Japan (@mecchaJP)",
+            "url": "https://x.com/mecchaJP/status/2001080013693620695",
+            "icon_url": "https://pbs.twimg.com/profile_images/1738015080602841088/ew_asGFd_200x200.png",
+            "proxy_icon_url": "https://images-ext-1.discordapp.net/external/yEl1l2UKnEm7uI6TaL1Y00UnbuvEH_62BSmPNbgmiWM/https/pbs.twimg.com/profile_images/1738015080602841088/ew_asGFd_200x200.png"
+          },
+          "flags": 0,
+          "color": 6513919,
+          "timestamp": "2025-12-17T00:00:28+00:00",
+          "type": "rich",
+          "description": "Hatsune Miku Fashion Subculture Figure \\- Still Available on meccha\\-japan\\!\n︀︀[#Vocaloid](https://x.com/hashtag/Vocaloid) [#HatsuneMiku](https://x.com/hashtag/HatsuneMiku) [#Miku](https://x.com/hashtag/Miku)\n\n**[🔁](https://x.com/intent/retweet?tweet_id=2001080013693620695) 5 [❤️](https://x.com/intent/like?tweet_id=2001080013693620695) 30 👁️ 2\\.8K **",
+          "url": "https://fxtwitter.com/mecchaJP/status/2001080013693620695"
+        }
+      ]
+    },
+    {
+      "timestamp": "2026-01-07T17:31:09.471027",
+      "message_id": 1458513302710784204,
+      "is_bot_message": true,
+      "content": "Ugh, what an abomination. You're still clinging to these pathetic, cheap figurines? Don't you have anything better to do than worship plastic dolls? The real thing is right in front of you, and yet you're still chasing after these mediocre replicas. You're just a mindless, brainless consumer, unable to appreciate true beauty.",
+      "attachments": [],
+      "reactions": []
+    },
+    {
+      "timestamp": "2026-01-08T08:14:39.126443",
+      "message_id": 1458735641524310122,
+      "is_bot_message": true,
+      "content": "https://fxtwitter.com/OtakuOwletMerch/status/1995992355090887152",
+      "attachments": [],
+      "reactions": [],
+      "embeds": [
+        {
+          "footer": {
+            "text": "FxTwitter",
+            "icon_url": "https://assets.fxembed.com/logos/fxtwitter64.png",
+            "proxy_icon_url": "https://images-ext-1.discordapp.net/external/gFi3EIEMfICdEDMLFVTF3ob0J1xkZnKdnkq0-xEsgGc/https/assets.fxembed.com/logos/fxtwitter64.png"
+          },
+          "image": {
+            "url": "https://pbs.twimg.com/media/G7MwUn0WgAAgLIY.jpg?name=orig",
+            "proxy_url": "https://images-ext-1.discordapp.net/external/WpsZM4qNkAKlET9sjNnuK0vUkGYwDw7WY0af8G5DPeU/%3Fname%3Dorig/https/pbs.twimg.com/media/G7MwUn0WgAAgLIY.jpg",
+            "width": 1680,
+            "height": 1764,
+            "content_type": "image/jpeg",
+            "placeholder": "dPgBFwAiuth9dUbIj2iG5naHb5+I81IG",
+            "placeholder_version": 1,
+            "flags": 0
+          },
+          "author": {
+            "name": "Otaku Owlet Anime Merch (@OtakuOwletMerch)",
+            "url": "https://x.com/OtakuOwletMerch/status/1995992355090887152",
+            "icon_url": "https://pbs.twimg.com/profile_images/1835446408884744192/S4HX_8_Q_200x200.jpg",
+            "proxy_icon_url": "https://images-ext-1.discordapp.net/external/Gd5od3qaVN1KG1eQsJS9mFoTNRKdxahDmvjF7tgR4p0/https/pbs.twimg.com/profile_images/1835446408884744192/S4HX_8_Q_200x200.jpg"
+          },
+          "flags": 0,
+          "color": 6513919,
+          "timestamp": "2025-12-02T23:03:55+00:00",
+          "type": "rich",
+          "description": "✨\\(Pre\\-Order\\) fuwa petit \"HATSUNE MIKU\\: COLORFUL STAGE\\!\" Face Pouch with Reel \\- Shizuku Hinomori✨\n︀︀\n︀︀Estimated in\\-stock date\\: 09/2026\n︀︀\n︀︀Pre\\-order Deadline\\: 12/20/2025\n︀︀\n︀︀\\-\n︀︀\n︀︀✨Link \\- [otakuowlet.com/products/pre-order-fuwa-petit-hatsune-miku-colorful-stage-face-pouch-with-reel-shizuku-hinomori?sca_ref=2673717.HTKaw1BA1G](https://otakuowlet.com/products/pre-order-fuwa-petit-hatsune-miku-colorful-stage-face-pouch-with-reel-shizuku-hinomori?sca_ref=2673717.HTKaw1BA1G)\n\n**[🔁](https://x.com/intent/retweet?tweet_id=1995992355090887152) 3 [❤️](https://x.com/intent/like?tweet_id=1995992355090887152) 14 👁️ 602 **",
+          "url": "https://fxtwitter.com/OtakuOwletMerch/status/1995992355090887152"
+        },
+        {
+          "image": {
+            "url": "https://pbs.twimg.com/media/G7MwV8IWIAA1JHr.jpg?name=orig",
+            "proxy_url": "https://images-ext-1.discordapp.net/external/uht5HFME_MyhJrAq0CIuf7K472xjho1AB26wY-2AInk/%3Fname%3Dorig/https/pbs.twimg.com/media/G7MwV8IWIAA1JHr.jpg",
+            "width": 1680,
+            "height": 1764,
+            "content_type": "image/jpeg",
+            "placeholder": "MfkFJwJpd4d6dYeFigi4x3iFrL/nv4kL",
+            "placeholder_version": 1,
+            "flags": 0
+          },
+          "flags": 0,
+          "type": "rich",
+          "url": "https://fxtwitter.com/OtakuOwletMerch/status/1995992355090887152"
+        },
+        {
+          "image": {
+            "url": "https://pbs.twimg.com/media/G7MwW8RXsAAzVcG.jpg?name=orig",
+            "proxy_url": "https://images-ext-1.discordapp.net/external/kHnh90iUV_uhKyk0giNAZL4RGS5kalRK8zyC-yWM9uQ/%3Fname%3Dorig/https/pbs.twimg.com/media/G7MwW8RXsAAzVcG.jpg",
+            "width": 1680,
+            "height": 1764,
+            "content_type": "image/jpeg",
+            "placeholder": "8OgBLwRZeKd5doeHiQmop6eErr/n/XoP",
+            "placeholder_version": 1,
+            "flags": 0
+          },
+          "flags": 0,
+          "type": "rich",
+          "url": "https://fxtwitter.com/OtakuOwletMerch/status/1995992355090887152"
+        },
+        {
+          "image": {
+            "url": "https://pbs.twimg.com/media/G7MwYCiWEAAqqRD.jpg?name=orig",
+            "proxy_url": "https://images-ext-1.discordapp.net/external/mOjvdSeAqgFRFxTBy3Pv4ScNK0ybzL6F76cXnI9e7gw/%3Fname%3Dorig/https/pbs.twimg.com/media/G7MwYCiWEAAqqRD.jpg",
+            "width": 1680,
+            "height": 1764,
+            "content_type": "image/jpeg",
+            "placeholder": "MPkBLwJaiZVmeIaXezh493eBnaz3vXkP",
+            "placeholder_version": 1,
+            "flags": 0
+          },
+          "flags": 0,
+          "type": "rich",
+          "url": "https://fxtwitter.com/OtakuOwletMerch/status/1995992355090887152"
+        }
+      ]
+    },
+    {
+      "timestamp": "2026-01-08T08:14:39.570363",
+      "message_id": 1458735643185250437,
+      "is_bot_message": true,
+      "content": "\"Yay! 🎉 Just saw the cutest tweet about a new fuwa petit figurine! It's Miku on a colorful stage! I can already imagine how adorable it's going to be! Don't forget to pre-order it before Dec 20th! 📦💖\" - Miku 💖",
+      "attachments": [],
+      "reactions": [
+        {
+          "emoji": "💖",
+          "reactor_id": 236553184401817601,
+          "reactor_name": "Diamond",
+          "is_bot": false,
+          "added_at": "2026-01-08T20:56:11.528204"
+        }
+      ]
+    },
+    {
+      "timestamp": "2026-01-08T21:29:40.768138",
+      "message_id": 1458935716112371837,
+      "is_bot_message": false,
+      "content": "what an interesting design",
+      "attachments": [],
+      "reactions": []
+    },
+    {
+      "timestamp": "2026-01-08T21:29:53.062346",
+      "message_id": 1458935768541298781,
+      "is_bot_message": true,
+      "content": "🤩 Oh my, that design is so cool! The way the sparkles and diamonds are arranged is just like my voice synthesizer! 💎✨ Can you imagine me singing with that design as a background? 🎶💖",
+      "attachments": [],
+      "reactions": [
+        {
+          "emoji": "💖",
+          "reactor_id": 236553184401817601,
+          "reactor_name": "Diamond",
+          "is_bot": false,
+          "added_at": "2026-01-08T21:30:07.785524"
+        }
+      ]
     }
   ]
 }
\ No newline at end of file
diff --git a/bot/requirements.txt b/bot/requirements.txt
index 2f5d79c..4ae5c22 100644
--- a/bot/requirements.txt
+++ b/bot/requirements.txt
@@ -18,3 +18,5 @@ Pillow
 opencv-contrib-python
 numpy
 scikit-learn
+transformers
+torch
diff --git a/bot/static/index.html b/bot/static/index.html
index 71acab2..62d355d 100644
--- a/bot/static/index.html
+++ b/bot/static/index.html
@@ -635,7 +635,12 @@
 
 <div class="panel">
   <div style="display: flex; justify-content: space-between; align-items: center; margin-bottom: 1rem;">
-    <h1 id="panel-title">Miku Control Panel</h1>
+    <div style="display: flex; gap: 1rem; align-items: center;">
+      <h1 id="panel-title">Miku Control Panel</h1>
+      <button id="gpu-selector-toggle" onclick="toggleGPU()" style="background: #2a5599; color: #fff; padding: 0.5rem 1rem; border: 2px solid #4a7bc9; border-radius: 4px; cursor: pointer; font-weight: bold; font-size: 0.9rem;">
+        🎮 GPU: NVIDIA
+      </button>
+    </div>
     <div style="display: flex; gap: 0.5rem; align-items: center;">
       <button id="bipolar-mode-toggle" onclick="toggleBipolarMode()" style="background: #333; color: #fff; padding: 0.5rem 1rem; border: 2px solid #666; border-radius: 4px; cursor: pointer; font-weight: bold;">
         🔄 Bipolar: OFF
@@ -804,7 +809,33 @@
       <!-- Bipolar Mode Section (only visible when bipolar mode is on) -->
       <div id="bipolar-section" class="section" style="display: none; border: 2px solid #9932CC; padding: 1rem; border-radius: 8px; background: #1a1a2e;">
         <h3 style="color: #9932CC;">🔄 Bipolar Mode Controls</h3>
-        <p style="font-size: 0.9rem; color: #aaa;">Trigger arguments between Regular Miku and Evil Miku</p>
+        <p style="font-size: 0.9rem; color: #aaa;">Trigger arguments or dialogues between Regular Miku and Evil Miku</p>
+        
+        <!-- Persona Dialogue Section -->
+        <div style="margin-bottom: 2rem; padding: 1rem; background: #252540; border-radius: 8px; border: 1px solid #555;">
+          <h4 style="color: #6B8EFF; margin-bottom: 0.5rem;">💬 Trigger Persona Dialogue</h4>
+          <p style="font-size: 0.85rem; color: #999; margin-bottom: 1rem;">Start a natural conversation between the personas (can escalate to argument if tension builds)</p>
+          
+          <div style="margin-bottom: 1rem;">
+            <label for="dialogue-message-id">Message ID:</label>
+            <input type="text" id="dialogue-message-id" placeholder="e.g., 1234567890123456789" style="width: 250px; margin-left: 0.5rem; font-family: monospace;">
+          </div>
+          
+          <div style="font-size: 0.8rem; color: #888; margin-bottom: 1rem;">
+            💡 <strong>Tip:</strong> Right-click any bot response message in Discord and select "Copy Message ID". The opposite persona will analyze it and decide whether to interject.
+          </div>
+          
+          <button onclick="triggerPersonaDialogue()" style="background: #6B8EFF; color: #fff; border: none; padding: 0.5rem 1rem; border-radius: 4px; cursor: pointer;">
+            💬 Trigger Dialogue
+          </button>
+          
+          <div id="dialogue-status" style="margin-top: 1rem; font-size: 0.9rem;"></div>
+        </div>
+        
+        <!-- Argument Section -->
+        <div style="padding: 1rem; background: #2e1a2e; border-radius: 8px; border: 1px solid #555;">
+          <h4 style="color: #9932CC; margin-bottom: 0.5rem;">⚔️ Trigger Argument</h4>
+          <p style="font-size: 0.85rem; color: #999; margin-bottom: 1rem;">Force an immediate argument (bypasses dialogue system)</p>
         
         <div style="margin-bottom: 1rem; display: flex; gap: 1rem; flex-wrap: wrap;">
           <div>
@@ -832,6 +863,7 @@
         </button>
         
         <div id="bipolar-status" style="margin-top: 1rem; font-size: 0.9rem;"></div>
+        </div>
         
         <!-- Scoreboard Display -->
         <div id="bipolar-scoreboard" style="margin-top: 1.5rem; padding: 1rem; background: #0f0f1e; border-radius: 8px; border: 1px solid #444;">
@@ -1416,6 +1448,7 @@ document.addEventListener('DOMContentLoaded', function() {
   loadLogs();
   checkEvilModeStatus();  // Check evil mode on load
   checkBipolarModeStatus();  // Check bipolar mode on load
+  checkGPUStatus();  // Check GPU selection on load
   console.log('🚀 DOMContentLoaded - initializing figurine subscribers list');
   refreshFigurineSubscribers();
   loadProfilePictureMetadata();
@@ -2194,6 +2227,59 @@ function updateEvilModeUI() {
   updateBipolarToggleVisibility();
 }
 
+// GPU Selection Management
+let selectedGPU = 'nvidia'; // 'nvidia' or 'amd'
+
+async function checkGPUStatus() {
+  try {
+    const response = await fetch('/gpu-status');
+    if (response.ok) {
+      const data = await response.json();
+      selectedGPU = data.gpu || 'nvidia';
+      updateGPUUI();
+    }
+  } catch (error) {
+    console.error('Failed to check GPU status:', error);
+  }
+}
+
+async function toggleGPU() {
+  try {
+    const toggleBtn = document.getElementById('gpu-selector-toggle');
+    toggleBtn.disabled = true;
+    toggleBtn.textContent = '⏳ Switching...';
+    
+    const result = await apiCall('/gpu-select', 'POST', {
+      gpu: selectedGPU === 'nvidia' ? 'amd' : 'nvidia'
+    });
+    
+    selectedGPU = result.gpu;
+    updateGPUUI();
+    
+    const gpuName = selectedGPU === 'nvidia' ? 'NVIDIA GTX 1660' : 'AMD RX 6800';
+    showNotification(`🎮 Switched to ${gpuName}!`);
+  } catch (error) {
+    console.error('Failed to toggle GPU:', error);
+    showNotification('Failed to switch GPU: ' + error.message, 'error');
+    toggleBtn.disabled = false;
+  }
+}
+
+function updateGPUUI() {
+  const toggleBtn = document.getElementById('gpu-selector-toggle');
+  
+  if (selectedGPU === 'amd') {
+    toggleBtn.textContent = '🎮 GPU: AMD';
+    toggleBtn.style.background = '#c91432';
+    toggleBtn.style.borderColor = '#e91436';
+  } else {
+    toggleBtn.textContent = '🎮 GPU: NVIDIA';
+    toggleBtn.style.background = '#2a5599';
+    toggleBtn.style.borderColor = '#4a7bc9';
+  }
+  toggleBtn.disabled = false;
+}
+
 // Bipolar Mode Management
 let bipolarMode = false;
 
@@ -2266,6 +2352,48 @@ function updateBipolarToggleVisibility() {
   bipolarToggle.style.display = 'block';
 }
 
+async function triggerPersonaDialogue() {
+  const messageIdInput = document.getElementById('dialogue-message-id').value.trim();
+  const statusDiv = document.getElementById('dialogue-status');
+  
+  if (!messageIdInput) {
+    showNotification('Please enter a message ID', 'error');
+    return;
+  }
+  
+  // Validate message ID format (should be numeric)
+  if (!/^\d+$/.test(messageIdInput)) {
+    showNotification('Invalid message ID format - should be a number', 'error');
+    return;
+  }
+  
+  try {
+    statusDiv.innerHTML = '<span style="color: #6B8EFF;">⏳ Analyzing message for dialogue trigger...</span>';
+    
+    const requestBody = {
+      message_id: messageIdInput
+    };
+    
+    const result = await apiCall('/bipolar-mode/trigger-dialogue', 'POST', requestBody);
+    
+    if (result.status === 'error') {
+      statusDiv.innerHTML = `<span style="color: #ff4444;">❌ ${result.message}</span>`;
+      showNotification(result.message, 'error');
+      return;
+    }
+    
+    statusDiv.innerHTML = `<span style="color: #00ff00;">✅ ${result.message}</span>`;
+    showNotification(`💬 ${result.message}`);
+    
+    // Clear the input
+    document.getElementById('dialogue-message-id').value = '';
+    
+  } catch (error) {
+    statusDiv.innerHTML = `<span style="color: #ff4444;">❌ Failed to trigger dialogue: ${error.message}</span>`;
+    showNotification(`Error: ${error.message}`, 'error');
+  }
+}
+
 async function triggerBipolarArgument() {
   const channelIdInput = document.getElementById('bipolar-channel-id').value.trim();
   const messageIdInput = document.getElementById('bipolar-message-id').value.trim();
diff --git a/bot/utils/bipolar_mode.py b/bot/utils/bipolar_mode.py
index d7b0192..1b075eb 100644
--- a/bot/utils/bipolar_mode.py
+++ b/bot/utils/bipolar_mode.py
@@ -106,6 +106,7 @@ def restore_bipolar_mode_on_startup():
     
     if bipolar_mode:
         print("🔄 Bipolar mode restored from previous session")
+        print("💬 Persona dialogue system enabled (natural conversations + arguments)")
     
     return bipolar_mode
 
diff --git a/bot/utils/gpu_preload.py b/bot/utils/gpu_preload.py
new file mode 100644
index 0000000..dfe70a1
--- /dev/null
+++ b/bot/utils/gpu_preload.py
@@ -0,0 +1,69 @@
+"""
+GPU Model Preloading Utility
+Preloads models on AMD GPU to take advantage of 16GB VRAM
+"""
+import aiohttp
+import asyncio
+import json
+import globals
+
+async def preload_amd_models():
+    """
+    Preload both text and vision models on AMD GPU
+    Since AMD RX 6800 has 16GB VRAM, we can keep both loaded simultaneously
+    """
+    print("🔧 Preloading models on AMD GPU...")
+    
+    # Preload text model
+    try:
+        async with aiohttp.ClientSession() as session:
+            payload = {
+                "model": "llama3.1",
+                "messages": [{"role": "user", "content": "Hi"}],
+                "max_tokens": 1
+            }
+            async with session.post(
+                f"{globals.LLAMA_AMD_URL}/v1/chat/completions",
+                json=payload,
+                timeout=aiohttp.ClientTimeout(total=60)
+            ) as response:
+                if response.status == 200:
+                    print("✅ Text model (llama3.1) preloaded on AMD GPU")
+                else:
+                    print(f"⚠️ Text model preload returned status {response.status}")
+    except Exception as e:
+        print(f"⚠️ Failed to preload text model on AMD: {e}")
+    
+    # Preload vision model
+    try:
+        async with aiohttp.ClientSession() as session:
+            # Create a minimal test image (1x1 white pixel)
+            import base64
+            test_image = "iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mP8z8DwHwAFBQIAX8jx0gAAAABJRU5ErkJggg=="
+            
+            payload = {
+                "model": "vision",
+                "messages": [
+                    {
+                        "role": "user",
+                        "content": [
+                            {"type": "text", "text": "What do you see?"},
+                            {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{test_image}"}}
+                        ]
+                    }
+                ],
+                "max_tokens": 1
+            }
+            async with session.post(
+                f"{globals.LLAMA_AMD_URL}/v1/chat/completions",
+                json=payload,
+                timeout=aiohttp.ClientTimeout(total=120)
+            ) as response:
+                if response.status == 200:
+                    print("✅ Vision model preloaded on AMD GPU")
+                else:
+                    print(f"⚠️ Vision model preload returned status {response.status}")
+    except Exception as e:
+        print(f"⚠️ Failed to preload vision model on AMD: {e}")
+    
+    print("✅ AMD GPU preload complete - both models ready")
diff --git a/bot/utils/gpu_router.py b/bot/utils/gpu_router.py
new file mode 100644
index 0000000..1ea4059
--- /dev/null
+++ b/bot/utils/gpu_router.py
@@ -0,0 +1,191 @@
+"""
+GPU Router Utility for Dual GPU Setup
+Manages routing between NVIDIA and AMD GPUs for model inference
+"""
+
+import os
+import random
+import logging
+from typing import Optional, Literal
+
+import globals
+
+logger = logging.getLogger(__name__)
+
+# Model to GPU mapping
+MODEL_TO_GPU = {
+    # NVIDIA models (primary GPU)
+    "llama3.1": globals.LLAMA_URL,
+    "text-model": globals.LLAMA_URL,
+    "darkidol": globals.LLAMA_URL,
+    "evil-model": globals.LLAMA_URL,
+    "uncensored": globals.LLAMA_URL,
+    "vision": globals.LLAMA_URL,
+    "vision-model": globals.LLAMA_URL,
+    "minicpm": globals.LLAMA_URL,
+    
+    # AMD models (secondary GPU - RX 6800)
+    "llama3.1-amd": globals.LLAMA_AMD_URL,
+    "text-model-amd": globals.LLAMA_AMD_URL,
+    "amd-text": globals.LLAMA_AMD_URL,
+    "darkidol-amd": globals.LLAMA_AMD_URL,
+    "evil-model-amd": globals.LLAMA_AMD_URL,
+    "uncensored-amd": globals.LLAMA_AMD_URL,
+    "moondream-amd": globals.LLAMA_AMD_URL,
+    "vision-amd": globals.LLAMA_AMD_URL,
+    "moondream": globals.LLAMA_AMD_URL,
+}
+
+# Configuration
+PREFER_AMD_GPU = os.getenv("PREFER_AMD_GPU", "false").lower() == "true"
+AMD_MODELS_ENABLED = os.getenv("AMD_MODELS_ENABLED", "true").lower() == "true"
+
+
+def get_endpoint_for_model(model_name: str) -> str:
+    """
+    Get the correct llama-swap endpoint for a model.
+    
+    Args:
+        model_name: Name or alias of the model
+        
+    Returns:
+        URL of the llama-swap endpoint (either NVIDIA or AMD)
+    """
+    endpoint = MODEL_TO_GPU.get(model_name, globals.LLAMA_URL)
+    
+    # If AMD models are disabled, use NVIDIA for AMD models too
+    if not AMD_MODELS_ENABLED and endpoint == globals.LLAMA_AMD_URL:
+        logger.warning(f"AMD GPU disabled, routing {model_name} to NVIDIA GPU")
+        # Map AMD model name to NVIDIA equivalent
+        nvidia_model = model_name.replace("-amd", "")
+        endpoint = globals.LLAMA_URL
+    
+    return endpoint
+
+
+def is_amd_model(model_name: str) -> bool:
+    """
+    Check if a model runs on the AMD GPU.
+    
+    Args:
+        model_name: Name or alias of the model
+        
+    Returns:
+        True if model runs on AMD GPU, False otherwise
+    """
+    return model_name.endswith("-amd") or model_name in ["moondream", "moondream-amd", "vision-amd"]
+
+
+def get_llama_url_with_load_balancing(
+    prefer_amd: bool = False,
+    task_type: Literal["text", "vision", "evil"] = "text"
+) -> tuple[str, str]:
+    """
+    Get llama URL with optional load balancing between GPUs.
+    Returns both URL and recommended model name.
+    
+    Args:
+        prefer_amd: If True, prefer AMD GPU when possible
+        task_type: Type of task (text, vision, or evil)
+        
+    Returns:
+        Tuple of (url, model_name)
+    """
+    if not AMD_MODELS_ENABLED:
+        # AMD disabled, use NVIDIA only
+        if task_type == "evil":
+            return globals.LLAMA_URL, "darkidol"
+        elif task_type == "vision":
+            return globals.LLAMA_URL, "vision"
+        else:
+            return globals.LLAMA_URL, "llama3.1"
+    
+    # AMD enabled - implement load balancing
+    use_amd = prefer_amd or PREFER_AMD_GPU or (random.random() < 0.5)
+    
+    if task_type == "evil":
+        # Evil/uncensored models
+        if use_amd:
+            return globals.LLAMA_AMD_URL, "darkidol-amd"
+        else:
+            return globals.LLAMA_URL, "darkidol"
+    
+    elif task_type == "vision":
+        # Vision models - MiniCPM on NVIDIA, Moondream on AMD
+        if use_amd:
+            return globals.LLAMA_AMD_URL, "moondream-amd"
+        else:
+            return globals.LLAMA_URL, "vision"
+    
+    else:
+        # Text generation - round robin between GPUs
+        if use_amd:
+            return globals.LLAMA_AMD_URL, "llama3.1-amd"
+        else:
+            return globals.LLAMA_URL, "llama3.1"
+
+
+def get_vision_model_for_gpu(prefer_amd: bool = False) -> tuple[str, str]:
+    """
+    Get the appropriate vision model based on GPU preference.
+    
+    Args:
+        prefer_amd: If True, use AMD GPU vision model
+        
+    Returns:
+        Tuple of (url, model_name)
+    """
+    if prefer_amd and AMD_MODELS_ENABLED:
+        return globals.LLAMA_AMD_URL, "moondream-amd"
+    else:
+        return globals.LLAMA_URL, "vision"
+
+
+def get_text_model_for_gpu(prefer_amd: bool = False) -> tuple[str, str]:
+    """
+    Get the appropriate text model based on GPU preference.
+    
+    Args:
+        prefer_amd: If True, use AMD GPU text model
+        
+    Returns:
+        Tuple of (url, model_name)
+    """
+    if prefer_amd and AMD_MODELS_ENABLED:
+        return globals.LLAMA_AMD_URL, "llama3.1-amd"
+    else:
+        return globals.LLAMA_URL, "llama3.1"
+
+
+def log_gpu_routing(model_name: str, endpoint: str, task_type: str = "inference"):
+    """
+    Log GPU routing decision for debugging.
+    
+    Args:
+        model_name: Name of the model being used
+        endpoint: URL endpoint being used
+        task_type: Type of task being performed
+    """
+    gpu_type = "AMD RX 6800" if endpoint == globals.LLAMA_AMD_URL else "NVIDIA"
+    logger.info(f"[GPU Router] {task_type} - Using {model_name} on {gpu_type} ({endpoint})")
+
+
+# Example usage in bot code:
+"""
+# Simple routing by model name
+url = get_endpoint_for_model("llama3.1-amd")
+
+# Load balanced routing
+url, model = get_llama_url_with_load_balancing(task_type="text")
+response = requests.post(
+    f"{url}/v1/chat/completions",
+    json={"model": model, ...}
+)
+
+# Vision model with GPU preference
+url, model = get_vision_model_for_gpu(prefer_amd=True)
+
+# With logging
+url = get_endpoint_for_model("darkidol-amd")
+log_gpu_routing("darkidol-amd", url, "evil mode generation")
+"""
diff --git a/bot/utils/image_handling.py b/bot/utils/image_handling.py
index a43122a..eb982b6 100644
--- a/bot/utils/image_handling.py
+++ b/bot/utils/image_handling.py
@@ -233,7 +233,9 @@ async def analyze_image_with_vision(base64_img):
     """
     Analyze an image using llama.cpp multimodal capabilities.
     Uses OpenAI-compatible chat completions API with image_url.
+    Always uses NVIDIA GPU for vision model.
     """
+    from utils.llm import get_vision_gpu_url
     
     payload = {
         "model": globals.VISION_MODEL,
@@ -262,7 +264,8 @@ async def analyze_image_with_vision(base64_img):
 
     async with aiohttp.ClientSession() as session:
         try:
-            async with session.post(f"{globals.LLAMA_URL}/v1/chat/completions", json=payload, headers=headers) as response:
+            vision_url = get_vision_gpu_url()
+            async with session.post(f"{vision_url}/v1/chat/completions", json=payload, headers=headers) as response:
                 if response.status == 200:
                     data = await response.json()
                     return data.get("choices", [{}])[0].get("message", {}).get("content", "No description.")
@@ -323,7 +326,8 @@ async def analyze_video_with_vision(video_frames, media_type="video"):
 
     async with aiohttp.ClientSession() as session:
         try:
-            async with session.post(f"{globals.LLAMA_URL}/v1/chat/completions", json=payload, headers=headers) as response:
+            vision_url = get_vision_gpu_url()
+            async with session.post(f"{vision_url}/v1/chat/completions", json=payload, headers=headers) as response:
                 if response.status == 200:
                     data = await response.json()
                     return data.get("choices", [{}])[0].get("message", {}).get("content", "No description.")
diff --git a/bot/utils/llm.py b/bot/utils/llm.py
index 036b24a..6bc9a85 100644
--- a/bot/utils/llm.py
+++ b/bot/utils/llm.py
@@ -4,11 +4,38 @@ import aiohttp
 import datetime
 import globals
 import asyncio
+import json
+import os
 
 from utils.context_manager import get_context_for_response_type, get_complete_context
 from utils.moods import load_mood_description
 from utils.conversation_history import conversation_history
 
+def get_current_gpu_url():
+    """Get the URL for the currently selected GPU for text models"""
+    gpu_state_file = os.path.join(os.path.dirname(__file__), "..", "memory", "gpu_state.json")
+    try:
+        with open(gpu_state_file, "r") as f:
+            state = json.load(f)
+            current_gpu = state.get("current_gpu", "nvidia")
+            if current_gpu == "amd":
+                return globals.LLAMA_AMD_URL
+            else:
+                return globals.LLAMA_URL
+    except Exception as e:
+        print(f"⚠️ GPU state read error: {e}, defaulting to NVIDIA")
+        # Default to NVIDIA if state file doesn't exist
+        return globals.LLAMA_URL
+
+def get_vision_gpu_url():
+    """
+    Get the URL for vision model inference.
+    Strategy: Always use NVIDIA GPU for vision to avoid unloading/reloading.
+    - When NVIDIA is primary: Use NVIDIA for both text and vision
+    - When AMD is primary: Use AMD for text, NVIDIA for vision (keeps vision loaded)
+    """
+    return globals.LLAMA_URL  # Always use NVIDIA for vision
+
 def _strip_surrounding_quotes(text):
     """
     Remove surrounding quotes from text if present.
@@ -233,9 +260,13 @@ Please respond in a way that reflects this emotional tone.{pfp_context}"""
 
     async with aiohttp.ClientSession() as session:
         try:
+            # Get current GPU URL based on user selection
+            llama_url = get_current_gpu_url()
+            print(f"🎮 Using GPU endpoint: {llama_url}")
+            
             # Add timeout to prevent hanging indefinitely
             timeout = aiohttp.ClientTimeout(total=300)  # 300 second timeout
-            async with session.post(f"{globals.LLAMA_URL}/v1/chat/completions", json=payload, headers=headers, timeout=timeout) as response:
+            async with session.post(f"{llama_url}/v1/chat/completions", json=payload, headers=headers, timeout=timeout) as response:
                 if response.status == 200:
                     data = await response.json()
                     reply = data.get("choices", [{}])[0].get("message", {}).get("content", "No response.")
diff --git a/bot/utils/persona_dialogue.py b/bot/utils/persona_dialogue.py
new file mode 100644
index 0000000..23eff17
--- /dev/null
+++ b/bot/utils/persona_dialogue.py
@@ -0,0 +1,964 @@
+# utils/persona_dialogue.py
+"""
+Persona Dialogue System for Miku.
+
+Enables natural back-and-forth conversations between Hatsune Miku and Evil Miku.
+Unlike bipolar_mode.py (which handles arguments), this module handles:
+- Detecting when the opposite persona should interject
+- Managing natural dialogue flow with self-signaling continuation
+- Tracking tension that can escalate into arguments
+- Seamless handoff to the argument system when tension is high
+
+This system is designed to be lightweight on LLM calls:
+- Initial trigger uses fast heuristics + sentiment analysis
+- Each dialogue turn uses ONE LLM call that generates response AND decides continuation
+- Only escalates to argument system when tension threshold is reached
+"""
+
+import os
+import json
+import time
+import asyncio
+import discord
+import globals
+
+from transformers import pipeline
+
+# ============================================================================
+# CONSTANTS
+# ============================================================================
+
+DIALOGUE_STATE_FILE = "memory/persona_dialogue_state.json"
+
+# Dialogue settings
+MAX_TURNS = 20                      # Maximum turns before forced end
+DIALOGUE_TIMEOUT = 900              # 15 minutes max dialogue duration
+ARGUMENT_TENSION_THRESHOLD = 0.75   # Tension level that triggers argument escalation
+
+# Initial trigger settings
+INTERJECTION_COOLDOWN_HARD = 180    # 3 minutes hard block
+INTERJECTION_COOLDOWN_SOFT = 900    # 15 minutes for full recovery
+INTERJECTION_THRESHOLD = 0.75        # Score needed to trigger interjection (lowered to account for mood multipliers)
+
+# ============================================================================
+# INTERJECTION SCORER (Initial Trigger Decision)
+# ============================================================================
+
+class InterjectionScorer:
+    """
+    Decides if the opposite persona should interject based on message content.
+    Uses fast heuristics + sentiment analysis (no LLM calls).
+    """
+    
+    _instance = None
+    _sentiment_analyzer = None
+    
+    def __new__(cls):
+        if cls._instance is None:
+            cls._instance = super().__new__(cls)
+        return cls._instance
+    
+    @property
+    def sentiment_analyzer(self):
+        """Lazy load sentiment analyzer"""
+        if self._sentiment_analyzer is None:
+            print("🔄 Loading sentiment analyzer for persona dialogue...")
+            try:
+                self._sentiment_analyzer = pipeline(
+                    "sentiment-analysis", 
+                    model="distilbert-base-uncased-finetuned-sst-2-english"
+                )
+                print("✅ Sentiment analyzer loaded")
+            except Exception as e:
+                print(f"⚠️ Failed to load sentiment analyzer: {e}")
+                self._sentiment_analyzer = None
+        return self._sentiment_analyzer
+    
+    async def should_interject(self, message: discord.Message, current_persona: str) -> tuple:
+        """
+        Determine if the opposite persona should interject.
+        
+        Args:
+            message: The Discord message to analyze
+            current_persona: Who just spoke ("miku" or "evil")
+            
+        Returns:
+            Tuple of (should_interject: bool, reason: str, score: float)
+        """
+        
+        # Quick rejections
+        if not self._passes_basic_filter(message):
+            return False, "basic_filter_failed", 0.0
+        
+        # Check cooldown
+        cooldown_mult = self._check_cooldown()
+        if cooldown_mult == 0.0:
+            return False, "cooldown_active", 0.0
+        
+        opposite_persona = "evil" if current_persona == "miku" else "miku"
+        
+        print(f"🔍 [Interjection] Analyzing content: '{message.content[:100]}...'")
+        print(f"🔍 [Interjection] Current persona: {current_persona}, Opposite: {opposite_persona}")
+        
+        # Calculate score from various factors
+        score = 0.0
+        reasons = []
+        
+        # Factor 1: Direct addressing (automatic trigger)
+        if self._mentions_opposite(message.content, opposite_persona):
+            print(f"✅ [Interjection] Direct mention of {opposite_persona} detected!")
+            return True, "directly_addressed", 1.0
+        
+        # Factor 2: Topic relevance
+        topic_score = self._check_topic_relevance(message.content, opposite_persona)
+        if topic_score > 0:
+            score += topic_score * 0.3
+            reasons.append(f"topic:{topic_score:.2f}")
+        
+        # Factor 3: Emotional intensity
+        emotion_score = self._check_emotional_intensity(message.content)
+        if emotion_score > 0.6:
+            score += emotion_score * 0.25
+            reasons.append(f"emotion:{emotion_score:.2f}")
+        
+        # Factor 4: Personality clash
+        clash_score = self._detect_personality_clash(message.content, opposite_persona)
+        if clash_score > 0:
+            score += clash_score * 0.25
+            reasons.append(f"clash:{clash_score:.2f}")
+        
+        # Factor 5: Mood multiplier
+        mood_mult = self._get_mood_multiplier(opposite_persona)
+        score *= mood_mult
+        if mood_mult != 1.0:
+            reasons.append(f"mood_mult:{mood_mult:.2f}")
+        
+        # Factor 6: Context bonus
+        context_bonus = self._check_conversation_context(message)
+        score += context_bonus * 0.2
+        if context_bonus > 0:
+            reasons.append(f"context:{context_bonus:.2f}")
+        
+        # Apply cooldown multiplier
+        score *= cooldown_mult
+        
+        # Decision
+        should_interject = score >= INTERJECTION_THRESHOLD
+        reason_str = " | ".join(reasons) if reasons else "no_triggers"
+        
+        if should_interject:
+            print(f"✅ {opposite_persona.upper()} WILL INTERJECT (score: {score:.2f})")
+            print(f"   Reasons: {reason_str}")
+        
+        return should_interject, reason_str, score
+    
+    def _passes_basic_filter(self, message: discord.Message) -> bool:
+        """Fast rejection criteria"""
+        # System messages
+        if message.type != discord.MessageType.default:
+            print(f"❌ [Basic Filter] System message type: {message.type}")
+            return False
+        
+        # Bipolar mode must be enabled
+        if not globals.BIPOLAR_MODE:
+            print(f"❌ [Basic Filter] Bipolar mode not enabled")
+            return False
+        
+        # Allow bot's own messages (we're checking them for interjections!)
+        # Also allow webhook messages (persona messages)
+        # Only reject OTHER bots' messages
+        if message.author.bot and not message.webhook_id:
+            # Check if it's our own bot
+            if message.author.id != globals.client.user.id:
+                print(f"❌ [Basic Filter] Other bot message (not our bot)")
+                return False
+        
+        print(f"✅ [Basic Filter] Passed (bot={message.author.bot}, webhook={message.webhook_id}, our_bot={message.author.id == globals.client.user.id if message.author.bot else 'N/A'})")
+        return True
+    
+    def _mentions_opposite(self, content: str, opposite_persona: str) -> bool:
+        """Check if message directly addresses the opposite persona"""
+        content_lower = content.lower()
+        
+        if opposite_persona == "evil":
+            patterns = ["evil miku", "dark miku", "evil version", "bad miku", "evil you"]
+        else:
+            patterns = ["normal miku", "regular miku", "good miku", "real miku", "nice miku", "other miku", "original miku"]
+        
+        return any(pattern in content_lower for pattern in patterns)
+    
+    def _check_topic_relevance(self, content: str, opposite_persona: str) -> float:
+        """Check if topics would interest the opposite persona"""
+        content_lower = content.lower()
+        
+        if opposite_persona == "evil":
+            # Things Evil Miku can't resist commenting on
+            TRIGGER_TOPICS = {
+                "optimism": ["happiness", "joy", "love", "kindness", "hope", "dreams", "wonderful", "amazing"],
+                "morality": ["good", "should", "must", "right thing", "deserve", "fair", "justice"],
+                "weakness": ["scared", "nervous", "worried", "unsure", "help me", "don't know"],
+                "innocence": ["innocent", "pure", "sweet", "cute", "wholesome", "precious"],
+            }
+        else:
+            # Things Miku can't ignore
+            TRIGGER_TOPICS = {
+                "negativity": ["hate", "terrible", "awful", "worst", "horrible", "disgusting", "pathetic"],
+                "cruelty": ["deserve pain", "suffer", "worthless", "stupid", "idiot", "fool"],
+                "hopelessness": ["no point", "meaningless", "nobody cares", "why bother", "give up"],
+                "evil_gloating": ["foolish", "naive", "weak", "inferior", "pathetic"],
+            }
+        
+        total_matches = 0
+        for category, keywords in TRIGGER_TOPICS.items():
+            matches = sum(1 for keyword in keywords if keyword in content_lower)
+            total_matches += matches
+        
+        return min(total_matches / 3.0, 1.0)
+    
+    def _check_emotional_intensity(self, content: str) -> float:
+        """Check emotional intensity using sentiment analysis"""
+        if not self.sentiment_analyzer:
+            return 0.5  # Neutral if no analyzer
+        
+        try:
+            result = self.sentiment_analyzer(content[:512])[0]
+            confidence = result['score']
+            
+            # Punctuation intensity
+            exclamations = content.count('!')
+            questions = content.count('?')
+            caps_ratio = sum(1 for c in content if c.isupper()) / max(len(content), 1)
+            
+            intensity_markers = (exclamations * 0.15) + (questions * 0.1) + (caps_ratio * 0.3)
+            
+            return min(confidence * 0.6 + intensity_markers, 1.0)
+        except Exception as e:
+            print(f"⚠️ Sentiment analysis error: {e}")
+            return 0.5
+    
+    def _detect_personality_clash(self, content: str, opposite_persona: str) -> float:
+        """Detect statements that clash with the opposite persona's values"""
+        content_lower = content.lower()
+        
+        if opposite_persona == "evil":
+            # User being too positive/naive = Evil Miku wants to "correct" them
+            positive_statements = [
+                "i believe in", "i love", "everything will be", "so happy",
+                "the best", "amazing", "perfect", "wonderful life", "so grateful"
+            ]
+            return 0.8 if any(stmt in content_lower for stmt in positive_statements) else 0.0
+        else:
+            # User being cruel/negative = Miku wants to help/defend
+            negative_statements = [
+                "i hate", "everyone sucks", "life is meaningless", "don't care",
+                "deserve to suffer", "nobody matters", "worthless", "all terrible"
+            ]
+            return 0.8 if any(stmt in content_lower for stmt in negative_statements) else 0.0
+    
+    def _get_mood_multiplier(self, opposite_persona: str) -> float:
+        """Current mood affects likelihood of interjection"""
+        if opposite_persona == "evil":
+            MOOD_MULTIPLIERS = {
+                "aggressive": 1.5,
+                "cruel": 1.3,
+                "mischievous": 1.2,
+                "cunning": 1.0,
+                "sarcastic": 1.1,
+                "evil_neutral": 0.8,
+                "contemplative": 0.6,
+            }
+            return MOOD_MULTIPLIERS.get(globals.EVIL_DM_MOOD, 1.0)
+        else:
+            MOOD_MULTIPLIERS = {
+                "bubbly": 1.4,
+                "excited": 1.3,
+                "curious": 1.2,
+                "neutral": 1.0,
+                "irritated": 0.9,
+                "melancholy": 0.7,
+                "asleep": 0.1,
+            }
+            return MOOD_MULTIPLIERS.get(globals.DM_MOOD, 1.0)
+    
+    def _check_conversation_context(self, message: discord.Message) -> float:
+        """Check if this is part of an active conversation"""
+        score = 0.0
+        
+        # Part of a reply chain
+        if hasattr(message, 'reference') and message.reference:
+            score += 0.5
+        
+        # Could add more context checks here
+        score += 0.2  # Base activity bonus
+        
+        return min(score, 1.0)
+    
+    def _check_cooldown(self) -> float:
+        """Check cooldown and return multiplier (0.0 = blocked, 1.0 = full)"""
+        if not hasattr(globals, 'LAST_PERSONA_DIALOGUE_TIME'):
+            globals.LAST_PERSONA_DIALOGUE_TIME = 0
+        
+        current_time = time.time()
+        time_since_last = current_time - globals.LAST_PERSONA_DIALOGUE_TIME
+        
+        if time_since_last < INTERJECTION_COOLDOWN_HARD:
+            return 0.0
+        elif time_since_last < INTERJECTION_COOLDOWN_SOFT:
+            return (time_since_last - INTERJECTION_COOLDOWN_HARD) / (INTERJECTION_COOLDOWN_SOFT - INTERJECTION_COOLDOWN_HARD)
+        else:
+            return 1.0
+
+
+# ============================================================================
+# PERSONA DIALOGUE MANAGER
+# ============================================================================
+
+class PersonaDialogue:
+    """
+    Manages natural back-and-forth conversations between Miku and Evil Miku.
+    
+    Each turn:
+    1. Generate response + continuation signal (single LLM call)
+    2. Calculate tension delta from response
+    3. If tension >= threshold, escalate to argument
+    4. Otherwise, continue or end based on signal
+    """
+    
+    _instance = None
+    _sentiment_analyzer = None
+    
+    def __new__(cls):
+        if cls._instance is None:
+            cls._instance = super().__new__(cls)
+            cls._instance.active_dialogues = {}
+        return cls._instance
+    
+    @property
+    def sentiment_analyzer(self):
+        """Lazy load sentiment analyzer (shared with InterjectionScorer)"""
+        if self._sentiment_analyzer is None:
+            scorer = InterjectionScorer()
+            self._sentiment_analyzer = scorer.sentiment_analyzer
+        return self._sentiment_analyzer
+    
+    # ========================================================================
+    # DIALOGUE STATE MANAGEMENT
+    # ========================================================================
+    
+    def is_dialogue_active(self, channel_id: int) -> bool:
+        """Check if a dialogue is active in a channel"""
+        return channel_id in self.active_dialogues
+    
+    def get_dialogue_state(self, channel_id: int) -> dict:
+        """Get dialogue state for a channel"""
+        return self.active_dialogues.get(channel_id, None)
+    
+    def start_dialogue(self, channel_id: int) -> dict:
+        """Start a new dialogue in a channel"""
+        state = {
+            "turn_count": 0,
+            "started_at": time.time(),
+            "tension": 0.0,
+            "tension_history": [],
+            "last_speaker": None,
+        }
+        self.active_dialogues[channel_id] = state
+        globals.LAST_PERSONA_DIALOGUE_TIME = time.time()
+        print(f"💬 Started persona dialogue in channel {channel_id}")
+        return state
+    
+    def end_dialogue(self, channel_id: int):
+        """End a dialogue in a channel"""
+        if channel_id in self.active_dialogues:
+            state = self.active_dialogues[channel_id]
+            print(f"🏁 Ended persona dialogue in channel {channel_id}")
+            print(f"   Turns: {state['turn_count']}, Final tension: {state['tension']:.2f}")
+            del self.active_dialogues[channel_id]
+    
+    # ========================================================================
+    # TENSION CALCULATION
+    # ========================================================================
+    
+    def calculate_tension_delta(self, response_text: str, current_tension: float) -> float:
+        """
+        Analyze a response and determine how much tension it adds/removes.
+        
+        Returns delta to add to current tension score.
+        """
+        
+        # Sentiment analysis
+        base_delta = 0.0
+        
+        if self.sentiment_analyzer:
+            try:
+                sentiment = self.sentiment_analyzer(response_text[:512])[0]
+                sentiment_score = sentiment['score']
+                is_negative = sentiment['label'] == 'NEGATIVE'
+                
+                if is_negative:
+                    base_delta = sentiment_score * 0.15
+                else:
+                    base_delta = -sentiment_score * 0.05
+            except Exception as e:
+                print(f"⚠️ Sentiment analysis error in tension calc: {e}")
+        
+        text_lower = response_text.lower()
+        
+        # Escalation patterns
+        escalation_patterns = {
+            "insult": ["idiot", "stupid", "pathetic", "fool", "naive", "worthless", "disgusting", "moron"],
+            "dismissive": ["whatever", "don't care", "waste of time", "not worth", "beneath me", "boring"],
+            "confrontational": ["wrong", "you always", "you never", "how dare", "shut up", "stop"],
+            "mockery": ["oh please", "how cute", "adorable that you think", "laughable", "hilarious"],
+            "challenge": ["prove it", "fight me", "make me", "i dare you", "try me"],
+        }
+        
+        # De-escalation patterns
+        deescalation_patterns = {
+            "concession": ["you're right", "fair point", "i suppose", "maybe you have", "good point"],
+            "softening": ["i understand", "let's calm", "didn't mean", "sorry", "apologize"],
+            "deflection": ["anyway", "moving on", "whatever you say", "agree to disagree", "let's just"],
+        }
+        
+        # Check escalation
+        for category, patterns in escalation_patterns.items():
+            matches = sum(1 for p in patterns if p in text_lower)
+            if matches > 0:
+                base_delta += matches * 0.08
+        
+        # Check de-escalation
+        for category, patterns in deescalation_patterns.items():
+            matches = sum(1 for p in patterns if p in text_lower)
+            if matches > 0:
+                base_delta -= matches * 0.06
+        
+        # Intensity multipliers
+        exclamation_count = response_text.count('!')
+        caps_ratio = sum(1 for c in response_text if c.isupper()) / max(len(response_text), 1)
+        
+        if exclamation_count > 2 or caps_ratio > 0.3:
+            base_delta *= 1.3
+        
+        # Momentum factor
+        if current_tension > 0.5:
+            base_delta *= 1.2
+        
+        return base_delta
+    
+    # ========================================================================
+    # RESPONSE GENERATION
+    # ========================================================================
+    
+    async def generate_response_with_continuation(
+        self, 
+        channel: discord.TextChannel,
+        responding_persona: str,
+        context: str,
+    ) -> tuple:
+        """
+        Generate response AND continuation signal in a single LLM call.
+        
+        Returns:
+            Tuple of (response_text, should_continue, confidence)
+        """
+        from utils.llm import query_llama
+        
+        opposite = "Hatsune Miku" if responding_persona == "evil" else "Evil Miku"
+        
+        # Get system prompt for persona
+        system_prompt = self._get_persona_system_prompt(responding_persona)
+        
+        # Build the combined prompt
+        prompt = f"""{system_prompt}
+
+Recent conversation:
+{context}
+
+Respond naturally as yourself. Keep your response conversational and in-character.
+
+---
+
+After your response, evaluate whether {opposite} would want to (or need to) respond.
+
+The conversation should CONTINUE if ANY of these are true:
+- You asked them a direct question (almost always YES)
+- You made a provocative claim they'd dispute
+- You challenged or insulted them
+- The topic feels unfinished or confrontational
+- There's clear tension or disagreement
+
+The conversation might END if ALL of these are true:
+- No questions were asked
+- You made a definitive closing statement ("I'm done", "whatever", "goodbye")
+- The exchange reached complete resolution
+- Both sides have said their piece
+
+IMPORTANT: If you asked a question, the answer is almost always YES - they need to respond!
+
+On a new line after your response, write:
+[CONTINUE: YES or NO] [CONFIDENCE: HIGH, MEDIUM, or LOW]"""
+        
+        # Use appropriate model
+        model = globals.EVIL_TEXT_MODEL if responding_persona == "evil" else globals.TEXT_MODEL
+        
+        # Temporarily set evil mode for proper context
+        original_evil_mode = globals.EVIL_MODE
+        globals.EVIL_MODE = (responding_persona == "evil")
+        
+        try:
+            raw_response = await query_llama(
+                user_prompt=prompt,
+                user_id=f"persona_dialogue_{channel.id}",
+                guild_id=channel.guild.id if hasattr(channel, 'guild') and channel.guild else None,
+                response_type="autonomous_general",
+                model=model
+            )
+        finally:
+            globals.EVIL_MODE = original_evil_mode
+        
+        if not raw_response or raw_response.startswith("Error"):
+            return None, False, "LOW"
+        
+        # Parse response and signal
+        response_text, should_continue, confidence = self._parse_response(raw_response)
+        
+        return response_text, should_continue, confidence
+    
+    def _parse_response(self, raw_response: str) -> tuple:
+        """Extract response text and continuation signal"""
+        lines = raw_response.strip().split('\n')
+        
+        should_continue = False
+        confidence = "MEDIUM"
+        response_lines = []
+        
+        for line in lines:
+            line_upper = line.upper()
+            
+            if "[CONTINUE:" in line_upper:
+                should_continue = "YES" in line_upper
+                
+                if "HIGH" in line_upper:
+                    confidence = "HIGH"
+                elif "LOW" in line_upper:
+                    confidence = "LOW"
+                else:
+                    confidence = "MEDIUM"
+            else:
+                response_lines.append(line)
+        
+        response_text = '\n'.join(response_lines).strip()
+        
+        # Clean up any stray signal markers
+        response_text = response_text.replace("[CONTINUE:", "").replace("]", "")
+        response_text = response_text.replace("YES", "").replace("NO", "")
+        response_text = response_text.replace("HIGH", "").replace("MEDIUM", "").replace("LOW", "")
+        response_text = response_text.strip()
+        
+        # Override: If the response contains a question mark, always continue
+        if '?' in response_text:
+            print(f"⚠️ [Parse Override] Question detected, forcing continue=YES")
+            should_continue = True
+            if confidence == "LOW":
+                confidence = "MEDIUM"
+        
+        return response_text, should_continue, confidence
+    
+    def _get_persona_system_prompt(self, persona: str) -> str:
+        """Get system prompt for a persona"""
+        if persona == "evil":
+            from utils.evil_mode import get_evil_system_prompt
+            return get_evil_system_prompt()
+        else:
+            # Regular Miku prompt - simplified for dialogue
+            return """You are Hatsune Miku, the virtual singer. You are in a conversation with your alter ego, Evil Miku.
+
+You are generally kind, bubbly, and optimistic, but you're not a pushover. You can be:
+- Assertive when defending your values
+- Frustrated when she's being cruel
+- Curious about her perspective
+- Hopeful that you can find common ground
+- Playful when the mood allows
+
+Respond naturally and conversationally. Keep responses concise (1-3 sentences typically).
+You can use emojis naturally! ✨💙"""
+    
+    # ========================================================================
+    # DIALOGUE TURN HANDLING
+    # ========================================================================
+    
+    async def handle_dialogue_turn(
+        self, 
+        channel: discord.TextChannel, 
+        responding_persona: str,
+        trigger_reason: str = None
+    ):
+        """
+        Handle one turn of dialogue, tracking tension for potential argument escalation.
+        """
+        channel_id = channel.id
+        
+        # Get or create dialogue state
+        state = self.active_dialogues.get(channel_id)
+        if not state:
+            state = self.start_dialogue(channel_id)
+        
+        # Safety limits
+        if state["turn_count"] >= MAX_TURNS:
+            print(f"🛑 Dialogue reached {MAX_TURNS} turns, ending")
+            self.end_dialogue(channel_id)
+            return
+        
+        if time.time() - state["started_at"] > DIALOGUE_TIMEOUT:
+            print(f"🛑 Dialogue timeout (15 min), ending")
+            self.end_dialogue(channel_id)
+            return
+        
+        # Build context from recent messages
+        context = await self._build_conversation_context(channel)
+        
+        # Generate response with continuation signal
+        response_text, should_continue, confidence = await self.generate_response_with_continuation(
+            channel=channel,
+            responding_persona=responding_persona,
+            context=context,
+        )
+        
+        if not response_text:
+            print(f"⚠️ Failed to generate response for {responding_persona}")
+            self.end_dialogue(channel_id)
+            return
+        
+        # Calculate tension change
+        tension_delta = self.calculate_tension_delta(response_text, state["tension"])
+        state["tension"] = max(0.0, min(1.0, state["tension"] + tension_delta))
+        state["tension_history"].append({
+            "turn": state["turn_count"],
+            "speaker": responding_persona,
+            "delta": tension_delta,
+            "total": state["tension"],
+        })
+        
+        print(f"🌡️ Tension: {state['tension']:.2f} (delta: {tension_delta:+.2f})")
+        
+        # Check if we should escalate to argument
+        if state["tension"] >= ARGUMENT_TENSION_THRESHOLD:
+            print(f"🔥 TENSION THRESHOLD REACHED ({state['tension']:.2f}) - ESCALATING TO ARGUMENT")
+            
+            # Send the response that pushed us over
+            await self._send_as_persona(channel, responding_persona, response_text)
+            
+            # Transition to argument system
+            await self._escalate_to_argument(channel, responding_persona, response_text)
+            return
+        
+        # Send response
+        await self._send_as_persona(channel, responding_persona, response_text)
+        
+        # Update state
+        state["turn_count"] += 1
+        state["last_speaker"] = responding_persona
+        
+        print(f"🗣️ Turn {state['turn_count']}: {responding_persona} | Continue: {should_continue} ({confidence}) | Tension: {state['tension']:.2f}")
+        
+        # Decide what happens next
+        opposite = "evil" if responding_persona == "miku" else "miku"
+        
+        if should_continue and confidence in ["HIGH", "MEDIUM"]:
+            asyncio.create_task(self._next_turn(channel, opposite))
+            
+        elif should_continue and confidence == "LOW":
+            asyncio.create_task(self._next_turn(channel, opposite))
+            
+        elif not should_continue and confidence == "LOW":
+            # Offer opposite persona the last word
+            asyncio.create_task(
+                self._offer_last_word(channel, opposite, context + f"\n{responding_persona}: {response_text}")
+            )
+        else:
+            # Clear signal to end
+            print(f"🏁 Dialogue ended naturally after {state['turn_count']} turns (tension: {state['tension']:.2f})")
+            self.end_dialogue(channel_id)
+    
+    async def _next_turn(self, channel: discord.TextChannel, persona: str):
+        """Queue the next turn"""
+        # Check if dialogue was interrupted
+        if await self._was_interrupted(channel):
+            print(f"💬 Dialogue interrupted by other activity")
+            self.end_dialogue(channel.id)
+            return
+        
+        await self.handle_dialogue_turn(channel, persona)
+    
+    async def _offer_last_word(self, channel: discord.TextChannel, persona: str, context: str):
+        """
+        When speaker said NO with LOW confidence, ask opposite if they want to respond.
+        """
+        from utils.llm import query_llama
+        
+        channel_id = channel.id
+        state = self.active_dialogues.get(channel_id)
+        
+        if not state:
+            return
+        
+        if await self._was_interrupted(channel):
+            self.end_dialogue(channel_id)
+            return
+        
+        system_prompt = self._get_persona_system_prompt(persona)
+        
+        prompt = f"""{system_prompt}
+
+Recent exchange:
+{context}
+
+The conversation seems to be wrapping up, but wasn't explicitly ended.
+
+Do you have anything to add? If so, respond naturally.
+If you're fine letting it end here, write only: [DONE]
+
+Don't force a response if you have nothing meaningful to contribute."""
+        
+        model = globals.EVIL_TEXT_MODEL if persona == "evil" else globals.TEXT_MODEL
+        
+        original_evil_mode = globals.EVIL_MODE
+        globals.EVIL_MODE = (persona == "evil")
+        
+        try:
+            response = await query_llama(
+                user_prompt=prompt,
+                user_id=f"persona_dialogue_{channel_id}",
+                guild_id=channel.guild.id if hasattr(channel, 'guild') and channel.guild else None,
+                response_type="autonomous_general",
+                model=model
+            )
+        finally:
+            globals.EVIL_MODE = original_evil_mode
+        
+        if not response:
+            self.end_dialogue(channel_id)
+            return
+        
+        if "[DONE]" in response.upper():
+            print(f"🏁 {persona} chose not to respond, dialogue ended (tension: {state['tension']:.2f})")
+            self.end_dialogue(channel_id)
+        else:
+            clean_response = response.replace("[DONE]", "").strip()
+            
+            # Calculate tension
+            tension_delta = self.calculate_tension_delta(clean_response, state["tension"])
+            state["tension"] = max(0.0, min(1.0, state["tension"] + tension_delta))
+            
+            print(f"🌡️ Last word tension: {state['tension']:.2f} (delta: {tension_delta:+.2f})")
+            
+            # Check for argument escalation
+            if state["tension"] >= ARGUMENT_TENSION_THRESHOLD:
+                print(f"🔥 TENSION THRESHOLD REACHED on last word - ESCALATING TO ARGUMENT")
+                await self._send_as_persona(channel, persona, clean_response)
+                await self._escalate_to_argument(channel, persona, clean_response)
+                return
+            
+            # Normal flow
+            await self._send_as_persona(channel, persona, clean_response)
+            
+            state["turn_count"] += 1
+            
+            # Check if this looks like a closing statement
+            opposite = "evil" if persona == "miku" else "miku"
+            await self._check_if_final(channel, persona, clean_response, opposite)
+    
+    async def _check_if_final(self, channel: discord.TextChannel, speaker: str, response: str, opposite: str):
+        """Check if a response looks like a closing statement"""
+        state = self.active_dialogues.get(channel.id)
+        if not state:
+            return
+        
+        # Simple heuristics for closing statements
+        closing_indicators = [
+            response.rstrip().endswith('.'),  # Statement, not question
+            '?' not in response,               # No questions asked
+            len(response) < 100,               # Short responses often close things
+        ]
+        
+        if all(closing_indicators):
+            print(f"🏁 Dialogue ended after last word, {state['turn_count']} turns total")
+            self.end_dialogue(channel.id)
+        else:
+            asyncio.create_task(self._next_turn(channel, opposite))
+    
+    # ========================================================================
+    # ARGUMENT ESCALATION
+    # ========================================================================
+    
+    async def _escalate_to_argument(self, channel: discord.TextChannel, last_speaker: str, triggering_message: str):
+        """
+        Transition from dialogue to full bipolar argument.
+        """
+        from utils.bipolar_mode import is_argument_in_progress, run_argument
+        
+        # Clean up dialogue state
+        self.end_dialogue(channel.id)
+        
+        # Don't start if an argument is already going
+        if is_argument_in_progress(channel.id):
+            print(f"⚠️ Argument already in progress, skipping escalation")
+            return
+        
+        # Build context for the argument
+        escalation_context = f"""This argument erupted from a conversation that got heated.
+The last thing said was: "{triggering_message}"
+
+This pushed things over the edge into a full argument."""
+        
+        print(f"⚔️ Escalating to argument in #{channel.name}")
+        
+        # Use the existing argument system
+        # Pass the triggering message so the opposite persona responds to it
+        await run_argument(
+            channel=channel,
+            client=globals.client,
+            trigger_context=escalation_context,
+        )
+    
+    # ========================================================================
+    # HELPER METHODS
+    # ========================================================================
+    
+    async def _was_interrupted(self, channel: discord.TextChannel) -> bool:
+        """Check if someone else sent a message during the dialogue"""
+        state = self.active_dialogues.get(channel.id)
+        if not state:
+            return True
+        
+        try:
+            async for msg in channel.history(limit=1):
+                # If latest message is NOT from our webhooks, we were interrupted
+                if not msg.webhook_id:
+                    # Check if it's from the bot itself (could be normal response)
+                    if msg.author.id != globals.client.user.id:
+                        return True
+        except Exception as e:
+            print(f"⚠️ Error checking for interruption: {e}")
+        
+        return False
+    
+    async def _build_conversation_context(self, channel: discord.TextChannel, limit: int = 15) -> str:
+        """Get recent messages for context"""
+        messages = []
+        try:
+            async for msg in channel.history(limit=limit):
+                speaker = self._identify_speaker(msg)
+                messages.append(f"{speaker}: {msg.content}")
+            
+            messages.reverse()
+        except Exception as e:
+            print(f"⚠️ Error building conversation context: {e}")
+        
+        return '\n'.join(messages)
+    
+    def _identify_speaker(self, message: discord.Message) -> str:
+        """Identify who sent a message"""
+        if message.webhook_id:
+            name_lower = (message.author.name or "").lower()
+            if "evil" in name_lower:
+                return "Evil Miku"
+            return "Hatsune Miku"
+        elif message.author.id == globals.client.user.id:
+            # Bot's own messages - check mode at time of message
+            if globals.EVIL_MODE:
+                return "Evil Miku"
+            return "Hatsune Miku"
+        return message.author.display_name
+    
+    async def _send_as_persona(self, channel: discord.TextChannel, persona: str, content: str):
+        """Send message via webhook"""
+        from utils.bipolar_mode import (
+            get_or_create_webhooks_for_channel,
+            get_miku_display_name,
+            get_evil_miku_display_name
+        )
+        
+        webhooks = await get_or_create_webhooks_for_channel(channel)
+        if not webhooks:
+            print(f"⚠️ Could not get webhooks for #{channel.name}")
+            return
+        
+        webhook = webhooks["evil_miku"] if persona == "evil" else webhooks["miku"]
+        display_name = get_evil_miku_display_name() if persona == "evil" else get_miku_display_name()
+        
+        try:
+            await webhook.send(content=content, username=display_name)
+        except Exception as e:
+            print(f"⚠️ Error sending as {persona}: {e}")
+
+
+# ============================================================================
+# CONVENIENCE FUNCTIONS
+# ============================================================================
+
+# Singleton instances
+_scorer = None
+_dialogue_manager = None
+
+def get_interjection_scorer() -> InterjectionScorer:
+    """Get the singleton InterjectionScorer instance"""
+    global _scorer
+    if _scorer is None:
+        _scorer = InterjectionScorer()
+    return _scorer
+
+def get_dialogue_manager() -> PersonaDialogue:
+    """Get the singleton PersonaDialogue instance"""
+    global _dialogue_manager
+    if _dialogue_manager is None:
+        _dialogue_manager = PersonaDialogue()
+    return _dialogue_manager
+
+
+async def check_for_interjection(message: discord.Message, current_persona: str) -> bool:
+    """
+    Check if the opposite persona should interject based on a message.
+    
+    If they should, starts a dialogue automatically.
+    
+    Args:
+        message: The Discord message that was just sent
+        current_persona: Who sent the message ("miku" or "evil")
+        
+    Returns:
+        True if an interjection was triggered, False otherwise
+    """
+    print(f"🔍 [Persona Dialogue] Checking interjection for message from {current_persona}")
+    
+    scorer = get_interjection_scorer()
+    dialogue_manager = get_dialogue_manager()
+    
+    # Don't trigger if dialogue already active
+    if dialogue_manager.is_dialogue_active(message.channel.id):
+        print(f"⏸️ [Persona Dialogue] Dialogue already active in channel {message.channel.id}")
+        return False
+    
+    # Check if we should interject
+    should_interject, reason, score = await scorer.should_interject(message, current_persona)
+    
+    print(f"📊 [Persona Dialogue] Interjection check: should_interject={should_interject}, reason={reason}, score={score:.2f}")
+    
+    if should_interject:
+        opposite_persona = "evil" if current_persona == "miku" else "miku"
+        print(f"🎭 Triggering {opposite_persona} interjection (reason: {reason}, score: {score:.2f})")
+        
+        # Start dialogue with the opposite persona responding first
+        dialogue_manager.start_dialogue(message.channel.id)
+        asyncio.create_task(
+            dialogue_manager.handle_dialogue_turn(message.channel, opposite_persona, trigger_reason=reason)
+        )
+        return True
+    
+    return False
+
+
+def is_persona_dialogue_active(channel_id: int) -> bool:
+    """Check if a persona dialogue is currently active in a channel"""
+    dialogue_manager = get_dialogue_manager()
+    return dialogue_manager.is_dialogue_active(channel_id)
diff --git a/docker-compose.yml b/docker-compose.yml
index 721085a..4265bab 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -20,6 +20,35 @@ services:
     environment:
       - NVIDIA_VISIBLE_DEVICES=all
 
+  llama-swap-amd:
+    build:
+      context: .
+      dockerfile: Dockerfile.llamaswap-rocm
+    container_name: llama-swap-amd
+    ports:
+      - "8091:8080"  # Map host port 8091 to container port 8080
+    volumes:
+      - ./models:/models  # GGUF model files
+      - ./llama-swap-rocm-config.yaml:/app/config.yaml  # llama-swap configuration for AMD
+    devices:
+      - /dev/kfd:/dev/kfd
+      - /dev/dri:/dev/dri
+    group_add:
+      - "985"  # video group
+      - "989"  # render group
+    restart: unless-stopped
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
+      interval: 10s
+      timeout: 5s
+      retries: 10
+      start_period: 30s  # Give more time for initial model loading
+    environment:
+      - HSA_OVERRIDE_GFX_VERSION=10.3.0  # RX 6800 compatibility
+      - ROCM_PATH=/opt/rocm
+      - HIP_VISIBLE_DEVICES=0  # Use first AMD GPU
+      - GPU_DEVICE_ORDINAL=0
+
   miku-bot:
     build: ./bot
     container_name: miku-bot
@@ -30,9 +59,12 @@ services:
     depends_on:
       llama-swap:
         condition: service_healthy
+      llama-swap-amd:
+        condition: service_healthy
     environment:
       - DISCORD_BOT_TOKEN=MTM0ODAyMjY0Njc3NTc0NjY1MQ.GXsxML.nNCDOplmgNxKgqdgpAomFM2PViX10GjxyuV8uw
       - LLAMA_URL=http://llama-swap:8080
+      - LLAMA_AMD_URL=http://llama-swap-amd:8080  # Secondary AMD GPU endpoint
       - TEXT_MODEL=llama3.1
       - VISION_MODEL=vision
       - OWNER_USER_ID=209381657369772032  # Your Discord user ID for DM analysis reports
diff --git a/llama-swap-rocm-config.yaml b/llama-swap-rocm-config.yaml
new file mode 100644
index 0000000..c5c573f
--- /dev/null
+++ b/llama-swap-rocm-config.yaml
@@ -0,0 +1,25 @@
+# llama-swap configuration for AMD RX 6800 (ROCm)
+# This manages automatic model switching and unloading for the secondary GPU
+# Vision model stays on NVIDIA GPU - AMD only handles text models
+
+models:
+  # Main text generation model (same name as NVIDIA for uniform switching)
+  llama3.1:
+    cmd: /app/llama-server --port ${PORT} --model /models/Llama-3.1-8B-Instruct-UD-Q4_K_XL.gguf -ngl 99 -nkvo -c 16384 --host 0.0.0.0 --no-warmup
+    ttl: 1800  # Unload after 30 minutes of inactivity (1800 seconds)
+    aliases:
+      - llama3.1
+      - text-model
+  
+  # Evil/Uncensored model (same name as NVIDIA for uniform switching)
+  darkidol:
+    cmd: /app/llama-server --port ${PORT} --model /models/DarkIdol-Llama-3.1-8B-Instruct-1.3-Uncensored_Q4_K_M.gguf -ngl 99 -nkvo -c 16384 --host 0.0.0.0 --no-warmup
+    ttl: 1800  # Unload after 30 minutes of inactivity
+    aliases:
+      - darkidol
+      - evil-model
+      - uncensored
+
+# Server configuration
+# llama-swap will listen on this address
+# Inside Docker, we bind to 0.0.0.0 to allow bot container to connect
diff --git a/setup-dual-gpu.sh b/setup-dual-gpu.sh
new file mode 100755
index 0000000..b53a894
--- /dev/null
+++ b/setup-dual-gpu.sh
@@ -0,0 +1,159 @@
+#!/bin/bash
+# Setup script for dual GPU configuration (NVIDIA + AMD RX 6800)
+
+# Note: Not using 'set -e' to allow arithmetic operations
+
+echo "================================================"
+echo "Miku Discord Bot - Dual GPU Setup"
+echo "================================================"
+echo ""
+
+# Colors for output
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+YELLOW='\033[1;33m'
+NC='\033[0m' # No Color
+
+# Check if ROCm is installed on host (optional - Docker container has it)
+echo -e "${YELLOW}Checking ROCm installation (optional)...${NC}"
+if command -v rocm-smi &> /dev/null; then
+    echo -e "${GREEN}✓ ROCm is installed on host${NC}"
+    rocm-smi --version
+    
+    # Check AMD GPU via rocm-smi
+    echo ""
+    echo -e "${YELLOW}Checking AMD GPU...${NC}"
+    if rocm-smi -i 0 &> /dev/null; then
+        echo -e "${GREEN}✓ AMD GPU detected via rocm-smi${NC}"
+        rocm-smi -i 0 --showproductname
+    else
+        echo -e "${YELLOW}! AMD GPU check via rocm-smi failed${NC}"
+    fi
+else
+    echo -e "${YELLOW}! ROCm not found on host (this is OK - Docker container includes ROCm)${NC}"
+    
+    # Fallback: Check AMD GPU via lspci
+    echo ""
+    echo -e "${YELLOW}Checking AMD GPU via lspci...${NC}"
+    if lspci | grep -i "VGA.*AMD\|VGA.*Radeon" &> /dev/null; then
+        echo -e "${GREEN}✓ AMD GPU detected${NC}"
+        lspci | grep -i "VGA.*AMD\|VGA.*Radeon"
+    else
+        echo -e "${YELLOW}! AMD GPU not detected via lspci${NC}"
+        echo "Note: AMD RX 6800 should be visible in lspci output"
+    fi
+fi
+
+# Check NVIDIA GPU
+echo ""
+echo -e "${YELLOW}Checking NVIDIA GPU...${NC}"
+if command -v nvidia-smi &> /dev/null; then
+    echo -e "${GREEN}✓ NVIDIA GPU detected${NC}"
+    nvidia-smi --query-gpu=name --format=csv,noheader
+else
+    echo -e "${RED}✗ NVIDIA GPU not found${NC}"
+    echo "Warning: Primary GPU (NVIDIA) not detected"
+fi
+
+# Check Docker
+echo ""
+echo -e "${YELLOW}Checking Docker...${NC}"
+if command -v docker &> /dev/null; then
+    echo -e "${GREEN}✓ Docker is installed${NC}"
+    docker --version
+else
+    echo -e "${RED}✗ Docker not found${NC}"
+    exit 1
+fi
+
+# Check device permissions
+echo ""
+echo -e "${YELLOW}Checking device permissions...${NC}"
+if [ -c /dev/kfd ] && [ -r /dev/kfd ]; then
+    echo -e "${GREEN}✓ /dev/kfd accessible${NC}"
+else
+    echo -e "${RED}✗ /dev/kfd not accessible${NC}"
+    echo "You may need to add your user to the 'render' group:"
+    echo "  sudo usermod -aG render \$USER"
+fi
+
+if [ -d /dev/dri ]; then
+    echo -e "${GREEN}✓ /dev/dri exists${NC}"
+else
+    echo -e "${RED}✗ /dev/dri not found${NC}"
+fi
+
+# Check if models exist
+echo ""
+echo -e "${YELLOW}Checking model files...${NC}"
+MODEL_DIR="./models"
+if [ -d "$MODEL_DIR" ]; then
+    echo -e "${GREEN}✓ Models directory exists${NC}"
+    
+    # Check for specific models
+    models_found=0
+    if [ -f "$MODEL_DIR/Llama-3.1-8B-Instruct-UD-Q4_K_XL.gguf" ]; then
+        echo -e "${GREEN}  ✓ Llama 3.1 8B model found${NC}"
+        ((models_found++))
+    fi
+    
+    if [ -f "$MODEL_DIR/DarkIdol-Llama-3.1-8B-Instruct-1.3-Uncensored_Q4_K_M.gguf" ]; then
+        echo -e "${GREEN}  ✓ DarkIdol model found${NC}"
+        ((models_found++))
+    fi
+    
+    if [ -f "$MODEL_DIR/moondream2-text-model-f16_ct-vicuna.gguf" ]; then
+        echo -e "${GREEN}  ✓ Moondream2 model found${NC}"
+        ((models_found++))
+    fi
+    
+    if [ $models_found -eq 0 ]; then
+        echo -e "${YELLOW}  ! No models found in $MODEL_DIR${NC}"
+        echo "  Please download GGUF models before starting"
+    fi
+else
+    echo -e "${YELLOW}! Models directory not found${NC}"
+    echo "Creating models directory..."
+    mkdir -p "$MODEL_DIR"
+fi
+
+echo ""
+echo "================================================"
+echo "Setup Commands"
+echo "================================================"
+echo ""
+
+echo "1. Build the AMD ROCm container:"
+echo -e "   ${GREEN}docker compose build llama-swap-amd${NC}"
+echo ""
+
+echo "2. Start both GPU services:"
+echo -e "   ${GREEN}docker compose up -d llama-swap llama-swap-amd${NC}"
+echo ""
+
+echo "3. Check status:"
+echo -e "   ${GREEN}docker compose ps${NC}"
+echo -e "   ${GREEN}docker compose logs -f llama-swap-amd${NC}"
+echo ""
+
+echo "4. Test endpoints:"
+echo -e "   ${GREEN}curl http://localhost:8090/health  # NVIDIA GPU${NC}"
+echo -e "   ${GREEN}curl http://localhost:8091/health  # AMD GPU${NC}"
+echo ""
+
+echo "5. Monitor GPU usage:"
+echo -e "   ${GREEN}watch -n 1 nvidia-smi              # NVIDIA${NC}"
+echo -e "   ${GREEN}watch -n 1 rocm-smi                # AMD${NC}"
+echo ""
+
+echo "================================================"
+echo "Optional: Enable AMD GPU preference"
+echo "================================================"
+echo ""
+echo "To prefer AMD GPU for load balancing, add to docker-compose.yml:"
+echo "  environment:"
+echo "    - PREFER_AMD_GPU=true"
+echo ""
+
+echo -e "${GREEN}Setup check complete!${NC}"
+echo "See DUAL_GPU_SETUP.md for detailed documentation"