diff --git a/DUAL_GPU_BUILD_SUMMARY.md b/DUAL_GPU_BUILD_SUMMARY.md new file mode 100644 index 0000000..acf7430 --- /dev/null +++ b/DUAL_GPU_BUILD_SUMMARY.md @@ -0,0 +1,184 @@ +# Dual GPU Setup Summary + +## What We Built + +A secondary llama-swap container optimized for your **AMD RX 6800** GPU using ROCm. + +### Architecture + +``` +Primary GPU (NVIDIA GTX 1660) Secondary GPU (AMD RX 6800) + ↓ ↓ + llama-swap (CUDA) llama-swap-amd (ROCm) + Port: 8090 Port: 8091 + ↓ ↓ + NVIDIA models AMD models + - llama3.1 - llama3.1-amd + - darkidol - darkidol-amd + - vision (MiniCPM) - moondream-amd +``` + +## Files Created + +1. **Dockerfile.llamaswap-rocm** - Custom multi-stage build: + - Stage 1: Builds llama.cpp with ROCm from source + - Stage 2: Builds llama-swap from source + - Stage 3: Runtime image with both binaries + +2. **llama-swap-rocm-config.yaml** - Model configuration for AMD GPU + +3. **docker-compose.yml** - Updated with `llama-swap-amd` service + +4. **bot/utils/gpu_router.py** - Load balancing utility + +5. **bot/globals.py** - Updated with `LLAMA_AMD_URL` + +6. **setup-dual-gpu.sh** - Setup verification script + +7. **DUAL_GPU_SETUP.md** - Comprehensive documentation + +8. **DUAL_GPU_QUICK_REF.md** - Quick reference guide + +## Why Custom Build? + +- llama.cpp doesn't publish ROCm Docker images (yet) +- llama-swap doesn't provide ROCm variants +- Building from source ensures latest ROCm compatibility +- Full control over compilation flags and optimization + +## Build Time + +The initial build takes 15-30 minutes depending on your system: +- llama.cpp compilation: ~10-20 minutes +- llama-swap compilation: ~1-2 minutes +- Image layering: ~2-5 minutes + +Subsequent builds are much faster due to Docker layer caching. + +## Next Steps + +Once the build completes: + +```bash +# 1. Start both GPU services +docker compose up -d llama-swap llama-swap-amd + +# 2. Verify both are running +docker compose ps + +# 3. Test NVIDIA GPU +curl http://localhost:8090/health + +# 4. Test AMD GPU +curl http://localhost:8091/health + +# 5. Monitor logs +docker compose logs -f llama-swap-amd + +# 6. Test model loading on AMD +curl -X POST http://localhost:8091/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d '{ + "model": "llama3.1-amd", + "messages": [{"role": "user", "content": "Hello!"}], + "max_tokens": 50 + }' +``` + +## Device Access + +The AMD container has access to: +- `/dev/kfd` - AMD GPU kernel driver +- `/dev/dri` - Direct Rendering Infrastructure +- Groups: `video`, `render` + +## Environment Variables + +RX 6800 specific settings: +```yaml +HSA_OVERRIDE_GFX_VERSION=10.3.0 # Navi 21 (gfx1030) compatibility +ROCM_PATH=/opt/rocm +HIP_VISIBLE_DEVICES=0 # Use first AMD GPU +``` + +## Bot Integration + +Your bot now has two endpoints available: + +```python +import globals + +# NVIDIA GPU (primary) +nvidia_url = globals.LLAMA_URL # http://llama-swap:8080 + +# AMD GPU (secondary) +amd_url = globals.LLAMA_AMD_URL # http://llama-swap-amd:8080 +``` + +Use the `gpu_router` utility for automatic load balancing: + +```python +from bot.utils.gpu_router import get_llama_url_with_load_balancing + +# Round-robin between GPUs +url, model = get_llama_url_with_load_balancing(task_type="text") + +# Prefer AMD for vision +url, model = get_llama_url_with_load_balancing( + task_type="vision", + prefer_amd=True +) +``` + +## Troubleshooting + +If the AMD container fails to start: + +1. **Check build logs:** + ```bash + docker compose build --no-cache llama-swap-amd + ``` + +2. **Verify GPU access:** + ```bash + ls -l /dev/kfd /dev/dri + ``` + +3. **Check container logs:** + ```bash + docker compose logs llama-swap-amd + ``` + +4. **Test GPU from host:** + ```bash + lspci | grep -i amd + # Should show: Radeon RX 6800 + ``` + +## Performance Notes + +**RX 6800 Specs:** +- VRAM: 16GB +- Architecture: RDNA 2 (Navi 21) +- Compute: gfx1030 + +**Recommended Models:** +- Q4_K_M quantization: 5-6GB per model +- Can load 2-3 models simultaneously +- Good for: Llama 3.1 8B, DarkIdol 8B, Moondream2 + +## Future Improvements + +1. **Automatic failover:** Route to AMD if NVIDIA is busy +2. **Health monitoring:** Track GPU utilization +3. **Dynamic routing:** Use least-busy GPU +4. **VRAM monitoring:** Alert before OOM +5. **Model preloading:** Keep common models loaded + +## Resources + +- [ROCm Documentation](https://rocmdocs.amd.com/) +- [llama.cpp ROCm Build](https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#rocm) +- [llama-swap GitHub](https://github.com/mostlygeek/llama-swap) +- [Full Setup Guide](./DUAL_GPU_SETUP.md) +- [Quick Reference](./DUAL_GPU_QUICK_REF.md) diff --git a/DUAL_GPU_QUICK_REF.md b/DUAL_GPU_QUICK_REF.md new file mode 100644 index 0000000..0439379 --- /dev/null +++ b/DUAL_GPU_QUICK_REF.md @@ -0,0 +1,194 @@ +# Dual GPU Quick Reference + +## Quick Start + +```bash +# 1. Run setup check +./setup-dual-gpu.sh + +# 2. Build AMD container +docker compose build llama-swap-amd + +# 3. Start both GPUs +docker compose up -d llama-swap llama-swap-amd + +# 4. Verify +curl http://localhost:8090/health # NVIDIA +curl http://localhost:8091/health # AMD RX 6800 +``` + +## Endpoints + +| GPU | Container | Port | Internal URL | +|-----|-----------|------|--------------| +| NVIDIA | llama-swap | 8090 | http://llama-swap:8080 | +| AMD RX 6800 | llama-swap-amd | 8091 | http://llama-swap-amd:8080 | + +## Models + +### NVIDIA GPU (Primary) +- `llama3.1` - Llama 3.1 8B Instruct +- `darkidol` - DarkIdol Uncensored 8B +- `vision` - MiniCPM-V-4.5 (4K context) + +### AMD RX 6800 (Secondary) +- `llama3.1-amd` - Llama 3.1 8B Instruct +- `darkidol-amd` - DarkIdol Uncensored 8B +- `moondream-amd` - Moondream2 Vision (2K context) + +## Commands + +### Start/Stop +```bash +# Start both +docker compose up -d llama-swap llama-swap-amd + +# Start only AMD +docker compose up -d llama-swap-amd + +# Stop AMD +docker compose stop llama-swap-amd + +# Restart AMD with logs +docker compose restart llama-swap-amd && docker compose logs -f llama-swap-amd +``` + +### Monitoring +```bash +# Container status +docker compose ps + +# Logs +docker compose logs -f llama-swap-amd + +# GPU usage +watch -n 1 nvidia-smi # NVIDIA +watch -n 1 rocm-smi # AMD + +# Resource usage +docker stats llama-swap llama-swap-amd +``` + +### Testing +```bash +# List available models +curl http://localhost:8091/v1/models | jq + +# Test text generation (AMD) +curl -X POST http://localhost:8091/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d '{ + "model": "llama3.1-amd", + "messages": [{"role": "user", "content": "Say hello!"}], + "max_tokens": 20 + }' | jq + +# Test vision model (AMD) +curl -X POST http://localhost:8091/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d '{ + "model": "moondream-amd", + "messages": [{ + "role": "user", + "content": [ + {"type": "text", "text": "Describe this image"}, + {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,..."}} + ] + }], + "max_tokens": 100 + }' | jq +``` + +## Bot Integration + +### Using GPU Router +```python +from bot.utils.gpu_router import get_llama_url_with_load_balancing, get_endpoint_for_model + +# Load balanced text generation +url, model = get_llama_url_with_load_balancing(task_type="text") + +# Specific model +url = get_endpoint_for_model("darkidol-amd") + +# Vision on AMD +url, model = get_llama_url_with_load_balancing(task_type="vision", prefer_amd=True) +``` + +### Direct Access +```python +import globals + +# AMD GPU +amd_url = globals.LLAMA_AMD_URL # http://llama-swap-amd:8080 + +# NVIDIA GPU +nvidia_url = globals.LLAMA_URL # http://llama-swap:8080 +``` + +## Troubleshooting + +### AMD Container Won't Start +```bash +# Check ROCm +rocm-smi + +# Check permissions +ls -l /dev/kfd /dev/dri + +# Check logs +docker compose logs llama-swap-amd + +# Rebuild +docker compose build --no-cache llama-swap-amd +``` + +### Model Won't Load +```bash +# Check VRAM +rocm-smi --showmeminfo vram + +# Lower GPU layers in llama-swap-rocm-config.yaml +# Change: -ngl 99 +# To: -ngl 50 +``` + +### GFX Version Error +```bash +# RX 6800 is gfx1030 +# Ensure in docker-compose.yml: +HSA_OVERRIDE_GFX_VERSION=10.3.0 +``` + +## Environment Variables + +Add to `docker-compose.yml` under `miku-bot` service: + +```yaml +environment: + - PREFER_AMD_GPU=true # Prefer AMD for load balancing + - AMD_MODELS_ENABLED=true # Enable AMD models + - LLAMA_AMD_URL=http://llama-swap-amd:8080 +``` + +## Files + +- `Dockerfile.llamaswap-rocm` - ROCm container +- `llama-swap-rocm-config.yaml` - AMD model config +- `bot/utils/gpu_router.py` - Load balancing utility +- `DUAL_GPU_SETUP.md` - Full documentation +- `setup-dual-gpu.sh` - Setup verification script + +## Performance Tips + +1. **Model Selection**: Use Q4_K quantization for best size/quality balance +2. **VRAM**: RX 6800 has 16GB - can run 2-3 Q4 models +3. **TTL**: Adjust in config files (1800s = 30min default) +4. **Context**: Lower context size (`-c 8192`) to save VRAM +5. **GPU Layers**: `-ngl 99` uses full GPU, lower if needed + +## Support + +- ROCm Docs: https://rocmdocs.amd.com/ +- llama.cpp: https://github.com/ggml-org/llama.cpp +- llama-swap: https://github.com/mostlygeek/llama-swap diff --git a/DUAL_GPU_SETUP.md b/DUAL_GPU_SETUP.md new file mode 100644 index 0000000..9ac9749 --- /dev/null +++ b/DUAL_GPU_SETUP.md @@ -0,0 +1,321 @@ +# Dual GPU Setup - NVIDIA + AMD RX 6800 + +This document describes the dual-GPU configuration for running two llama-swap instances simultaneously: +- **Primary GPU (NVIDIA)**: Runs main models via CUDA +- **Secondary GPU (AMD RX 6800)**: Runs additional models via ROCm + +## Architecture Overview + +``` +┌─────────────────────────────────────────────────────────────┐ +│ Miku Bot │ +│ │ +│ LLAMA_URL=http://llama-swap:8080 (NVIDIA) │ +│ LLAMA_AMD_URL=http://llama-swap-amd:8080 (AMD RX 6800) │ +└─────────────────────────────────────────────────────────────┘ + │ │ + │ │ + ▼ ▼ + ┌──────────────────┐ ┌──────────────────┐ + │ llama-swap │ │ llama-swap-amd │ + │ (CUDA) │ │ (ROCm) │ + │ Port: 8090 │ │ Port: 8091 │ + └──────────────────┘ └──────────────────┘ + │ │ + ▼ ▼ + ┌──────────────────┐ ┌──────────────────┐ + │ NVIDIA GPU │ │ AMD RX 6800 │ + │ - llama3.1 │ │ - llama3.1-amd │ + │ - darkidol │ │ - darkidol-amd │ + │ - vision │ │ - moondream-amd │ + └──────────────────┘ └──────────────────┘ +``` + +## Files Created + +1. **Dockerfile.llamaswap-rocm** - ROCm-enabled Docker image for AMD GPU +2. **llama-swap-rocm-config.yaml** - Model configuration for AMD models +3. **docker-compose.yml** - Updated with `llama-swap-amd` service + +## Configuration Details + +### llama-swap-amd Service + +```yaml +llama-swap-amd: + build: + context: . + dockerfile: Dockerfile.llamaswap-rocm + container_name: llama-swap-amd + ports: + - "8091:8080" # External access on port 8091 + volumes: + - ./models:/models + - ./llama-swap-rocm-config.yaml:/app/config.yaml + devices: + - /dev/kfd:/dev/kfd # AMD GPU kernel driver + - /dev/dri:/dev/dri # Direct Rendering Infrastructure + group_add: + - video + - render + environment: + - HSA_OVERRIDE_GFX_VERSION=10.3.0 # RX 6800 (Navi 21) compatibility +``` + +### Available Models on AMD GPU + +From `llama-swap-rocm-config.yaml`: + +- **llama3.1-amd** - Llama 3.1 8B text model +- **darkidol-amd** - DarkIdol uncensored model +- **moondream-amd** - Moondream2 vision model (smaller, AMD-optimized) + +### Model Aliases + +You can access AMD models using these aliases: +- `llama3.1-amd`, `text-model-amd`, `amd-text` +- `darkidol-amd`, `evil-model-amd`, `uncensored-amd` +- `moondream-amd`, `vision-amd`, `moondream` + +## Usage + +### Building and Starting Services + +```bash +# Build the AMD ROCm container +docker compose build llama-swap-amd + +# Start both GPU services +docker compose up -d llama-swap llama-swap-amd + +# Check logs +docker compose logs -f llama-swap-amd +``` + +### Accessing AMD Models from Bot Code + +In your bot code, you can now use either endpoint: + +```python +import globals + +# Use NVIDIA GPU (primary) +nvidia_response = requests.post( + f"{globals.LLAMA_URL}/v1/chat/completions", + json={"model": "llama3.1", ...} +) + +# Use AMD GPU (secondary) +amd_response = requests.post( + f"{globals.LLAMA_AMD_URL}/v1/chat/completions", + json={"model": "llama3.1-amd", ...} +) +``` + +### Load Balancing Strategy + +You can implement load balancing by: + +1. **Round-robin**: Alternate between GPUs for text generation +2. **Task-specific**: + - NVIDIA: Primary text + MiniCPM vision (heavy) + - AMD: Secondary text + Moondream vision (lighter) +3. **Failover**: Use AMD as backup if NVIDIA is busy + +Example load balancing function: + +```python +import random +import globals + +def get_llama_url(prefer_amd=False): + """Get llama URL with optional load balancing""" + if prefer_amd: + return globals.LLAMA_AMD_URL + + # Random load balancing for text models + return random.choice([globals.LLAMA_URL, globals.LLAMA_AMD_URL]) +``` + +## Testing + +### Test NVIDIA GPU (Port 8090) +```bash +curl http://localhost:8090/health +curl http://localhost:8090/v1/models +``` + +### Test AMD GPU (Port 8091) +```bash +curl http://localhost:8091/health +curl http://localhost:8091/v1/models +``` + +### Test Model Loading (AMD) +```bash +curl -X POST http://localhost:8091/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d '{ + "model": "llama3.1-amd", + "messages": [{"role": "user", "content": "Hello from AMD GPU!"}], + "max_tokens": 50 + }' +``` + +## Monitoring + +### Check GPU Usage + +**AMD GPU:** +```bash +# ROCm monitoring +rocm-smi + +# Or from host +watch -n 1 rocm-smi +``` + +**NVIDIA GPU:** +```bash +nvidia-smi +watch -n 1 nvidia-smi +``` + +### Check Container Resource Usage +```bash +docker stats llama-swap llama-swap-amd +``` + +## Troubleshooting + +### AMD GPU Not Detected + +1. Verify ROCm is installed on host: + ```bash + rocm-smi --version + ``` + +2. Check device permissions: + ```bash + ls -l /dev/kfd /dev/dri + ``` + +3. Verify RX 6800 compatibility: + ```bash + rocminfo | grep "Name:" + ``` + +### Model Loading Issues + +If models fail to load on AMD: + +1. Check VRAM availability: + ```bash + rocm-smi --showmeminfo vram + ``` + +2. Adjust `-ngl` (GPU layers) in config if needed: + ```yaml + # Reduce GPU layers for smaller VRAM + cmd: /app/llama-server ... -ngl 50 ... # Instead of 99 + ``` + +3. Check container logs: + ```bash + docker compose logs llama-swap-amd + ``` + +### GFX Version Mismatch + +RX 6800 is Navi 21 (gfx1030). If you see GFX errors: + +```bash +# Set in docker-compose.yml environment: +HSA_OVERRIDE_GFX_VERSION=10.3.0 +``` + +### llama-swap Build Issues + +If the ROCm container fails to build: + +1. The Dockerfile attempts to build llama-swap from source +2. Alternative: Use pre-built binary or simpler proxy setup +3. Check build logs: `docker compose build --no-cache llama-swap-amd` + +## Performance Considerations + +### Memory Usage + +- **RX 6800**: 16GB VRAM + - Q4_K_M/Q4_K_XL models: ~5-6GB each + - Can run 2 models simultaneously or 1 with long context + +### Model Selection + +**Best for AMD RX 6800:** +- ✅ Q4_K_M/Q4_K_S quantized models (5-6GB) +- ✅ Moondream2 vision (smaller, efficient) +- ⚠️ MiniCPM-V-4.5 (possible but may be tight on VRAM) + +### TTL Configuration + +Adjust model TTL in `llama-swap-rocm-config.yaml`: +- Lower TTL = more aggressive unloading = more VRAM available +- Higher TTL = less model swapping = faster response times + +## Advanced: Model-Specific Routing + +Create a helper function to route models automatically: + +```python +# bot/utils/gpu_router.py +import globals + +MODEL_TO_GPU = { + # NVIDIA models + "llama3.1": globals.LLAMA_URL, + "darkidol": globals.LLAMA_URL, + "vision": globals.LLAMA_URL, + + # AMD models + "llama3.1-amd": globals.LLAMA_AMD_URL, + "darkidol-amd": globals.LLAMA_AMD_URL, + "moondream-amd": globals.LLAMA_AMD_URL, +} + +def get_endpoint_for_model(model_name): + """Get the correct llama-swap endpoint for a model""" + return MODEL_TO_GPU.get(model_name, globals.LLAMA_URL) + +def is_amd_model(model_name): + """Check if model runs on AMD GPU""" + return model_name.endswith("-amd") +``` + +## Environment Variables + +Add these to control GPU selection: + +```yaml +# In docker-compose.yml +environment: + - LLAMA_URL=http://llama-swap:8080 + - LLAMA_AMD_URL=http://llama-swap-amd:8080 + - PREFER_AMD_GPU=false # Set to true to prefer AMD for general tasks + - AMD_MODELS_ENABLED=true # Enable/disable AMD models +``` + +## Future Enhancements + +1. **Automatic load balancing**: Monitor GPU utilization and route requests +2. **Health checks**: Fallback to primary GPU if AMD fails +3. **Model distribution**: Automatically assign models to GPUs based on VRAM +4. **Performance metrics**: Track response times per GPU +5. **Dynamic routing**: Use least-busy GPU for new requests + +## References + +- [ROCm Documentation](https://rocmdocs.amd.com/) +- [llama.cpp ROCm Support](https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md#rocm) +- [llama-swap GitHub](https://github.com/mostlygeek/llama-swap) +- [AMD GPU Compatibility Matrix](https://rocm.docs.amd.com/en/latest/release/gpu_os_support.html) diff --git a/Dockerfile.llamaswap-rocm b/Dockerfile.llamaswap-rocm new file mode 100644 index 0000000..72bd488 --- /dev/null +++ b/Dockerfile.llamaswap-rocm @@ -0,0 +1,101 @@ +# Multi-stage build for llama-swap with ROCm support +# Stage 1: Build llama.cpp with ROCm (requires ROCm 6.1+) +FROM rocm/dev-ubuntu-22.04:6.2.4 AS llama-builder + +WORKDIR /build + +# Install build dependencies including ROCm/HIP development libraries +RUN apt-get update && apt-get install -y \ + git \ + build-essential \ + cmake \ + wget \ + libcurl4-openssl-dev \ + hip-dev \ + hipblas-dev \ + rocblas-dev \ + && rm -rf /var/lib/apt/lists/* + +# Clone and build llama.cpp with HIP/ROCm support (gfx1030 = RX 6800) +RUN git clone https://github.com/ggml-org/llama.cpp.git && \ + cd llama.cpp && \ + HIPCXX="$(hipconfig -l)/clang" HIP_PATH="$(hipconfig -R)" \ + cmake -S . -B build -DGGML_HIP=ON -DGPU_TARGETS=gfx1030 -DCMAKE_BUILD_TYPE=Release && \ + cmake --build build --config Release -- -j$(nproc) && \ + cp build/bin/llama-server /build/llama-server && \ + find build -name "*.so*" -exec cp {} /build/ \; + +# Stage 2: Build llama-swap UI and binary +FROM node:22-alpine AS ui-builder + +WORKDIR /build + +# Install git +RUN apk add --no-cache git + +# Clone llama-swap +RUN git clone https://github.com/mostlygeek/llama-swap.git + +# Build UI +WORKDIR /build/llama-swap/ui +RUN npm install && npm run build + +# Stage 3: Build llama-swap binary +FROM golang:1.23-alpine AS swap-builder + +WORKDIR /build + +# Install git +RUN apk add --no-cache git + +# Copy llama-swap source with built UI +COPY --from=ui-builder /build/llama-swap /build/llama-swap + +# Build llama-swap binary +WORKDIR /build/llama-swap +RUN GOTOOLCHAIN=auto go build -o /build/llama-swap-binary . + +# Stage 4: Final runtime image +FROM rocm/dev-ubuntu-22.04:6.2.4 + +WORKDIR /app + +# Install runtime dependencies including additional ROCm libraries +RUN apt-get update && apt-get install -y \ + curl \ + ca-certificates \ + rocm-libs \ + && rm -rf /var/lib/apt/lists/* + +# Copy built binaries and shared libraries from previous stages +COPY --from=llama-builder /build/llama-server /app/llama-server +COPY --from=llama-builder /build/*.so* /app/ +COPY --from=swap-builder /build/llama-swap-binary /app/llama-swap + +# Make binaries executable +RUN chmod +x /app/llama-server /app/llama-swap + +# Create user and add to GPU access groups (using host GIDs) +# GID 187 = render group on host, GID 989 = video/kfd group on host +RUN groupadd -g 187 hostrender && \ + groupadd -g 989 hostvideo && \ + useradd -m -u 1000 -G hostrender,hostvideo llamaswap && \ + chown -R llamaswap:llamaswap /app + +# Set environment for ROCm (RX 6800 is gfx1030) +ENV HSA_OVERRIDE_GFX_VERSION=10.3.0 +ENV ROCM_PATH=/opt/rocm +ENV HIP_VISIBLE_DEVICES=0 +ENV LD_LIBRARY_PATH=/opt/rocm/lib:/app:$LD_LIBRARY_PATH + +USER llamaswap + +# Expose port +EXPOSE 8080 + +# Health check +HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 \ + CMD curl -f http://localhost:8080/health || exit 1 + +# Run llama-swap +CMD ["/app/llama-swap", "-config", "/app/config.yaml", "-listen", "0.0.0.0:8080"] diff --git a/bot/Dockerfile b/bot/Dockerfile index 2417a42..8c514ba 100644 --- a/bot/Dockerfile +++ b/bot/Dockerfile @@ -6,15 +6,14 @@ COPY requirements.txt . RUN pip install -r requirements.txt RUN playwright install +# Install system dependencies +# ffmpeg: video/audio processing for media handling +# libgl1: OpenGL library required by opencv-contrib-python +# libglib2.0-0: GLib library (common dependency) RUN apt-get update && apt-get install -y \ ffmpeg \ - libsm6 \ - libxext6 \ - libxcomposite1 \ - libxdamage1 \ - libgtk-3-0 \ - libgdk3.0-cil \ - libatk1.0-0 \ + libgl1 \ + libglib2.0-0 \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* diff --git a/bot/api.py b/bot/api.py index be1fc8f..f495987 100644 --- a/bot/api.py +++ b/bot/api.py @@ -52,6 +52,22 @@ from utils.figurine_notifier import ( from utils.dm_logger import dm_logger nest_asyncio.apply() +# ========== GPU Selection Helper ========== +def get_current_gpu_url(): + """Get the URL for the currently selected GPU""" + gpu_state_file = os.path.join(os.path.dirname(__file__), "memory", "gpu_state.json") + try: + with open(gpu_state_file, "r") as f: + state = json.load(f) + current_gpu = state.get("current_gpu", "nvidia") + if current_gpu == "amd": + return globals.LLAMA_AMD_URL + else: + return globals.LLAMA_URL + except: + # Default to NVIDIA if state file doesn't exist + return globals.LLAMA_URL + app = FastAPI() # Serve static folder @@ -363,6 +379,97 @@ def trigger_argument(data: BipolarTriggerRequest): "channel_id": channel_id } +@app.post("/bipolar-mode/trigger-dialogue") +def trigger_dialogue(data: dict): + """Manually trigger a persona dialogue from a message + + Forces the opposite persona to start a dialogue (bypasses the interjection check). + """ + from utils.persona_dialogue import get_dialogue_manager + from utils.bipolar_mode import is_bipolar_mode, is_argument_in_progress + + message_id_str = data.get("message_id") + if not message_id_str: + return {"status": "error", "message": "Message ID is required"} + + # Parse message ID + try: + message_id = int(message_id_str) + except ValueError: + return {"status": "error", "message": "Invalid message ID format"} + + if not is_bipolar_mode(): + return {"status": "error", "message": "Bipolar mode is not enabled"} + + if not globals.client or not globals.client.loop or not globals.client.loop.is_running(): + return {"status": "error", "message": "Discord client not ready"} + + import asyncio + + async def trigger_dialogue_task(): + try: + # Fetch the message + message = None + for channel in globals.client.get_all_channels(): + if hasattr(channel, 'fetch_message'): + try: + message = await channel.fetch_message(message_id) + break + except: + continue + + if not message: + print(f"⚠️ Message {message_id} not found") + return + + # Check if there's already an argument or dialogue in progress + dialogue_manager = get_dialogue_manager() + if dialogue_manager.is_dialogue_active(message.channel.id): + print(f"⚠️ Dialogue already active in channel {message.channel.id}") + return + + if is_argument_in_progress(message.channel.id): + print(f"⚠️ Argument already in progress in channel {message.channel.id}") + return + + # Determine current persona from the message author + if message.webhook_id: + # It's a webhook message, need to determine which persona + current_persona = "evil" if globals.EVIL_MODE else "miku" + elif message.author.id == globals.client.user.id: + # It's the bot's message + current_persona = "evil" if globals.EVIL_MODE else "miku" + else: + # User message - can't trigger dialogue from user messages + print(f"⚠️ Cannot trigger dialogue from user message") + return + + opposite_persona = "evil" if current_persona == "miku" else "miku" + + print(f"🎭 [Manual Trigger] Forcing {opposite_persona} to start dialogue on message {message_id}") + + # Force start the dialogue (bypass interjection check) + dialogue_manager.start_dialogue(message.channel.id) + asyncio.create_task( + dialogue_manager.handle_dialogue_turn( + message.channel, + opposite_persona, + trigger_reason="manual_trigger" + ) + ) + + except Exception as e: + print(f"⚠️ Error triggering dialogue: {e}") + import traceback + traceback.print_exc() + + globals.client.loop.create_task(trigger_dialogue_task()) + + return { + "status": "ok", + "message": f"Dialogue triggered for message {message_id}" + } + @app.get("/bipolar-mode/scoreboard") def get_bipolar_scoreboard(): """Get the bipolar mode argument scoreboard""" @@ -392,6 +499,51 @@ def cleanup_bipolar_webhooks(): globals.client.loop.create_task(cleanup_webhooks(globals.client)) return {"status": "ok", "message": "Webhook cleanup started"} +# ========== GPU Selection ========== +@app.get("/gpu-status") +def get_gpu_status(): + """Get current GPU selection""" + gpu_state_file = os.path.join(os.path.dirname(__file__), "memory", "gpu_state.json") + try: + with open(gpu_state_file, "r") as f: + state = json.load(f) + return {"gpu": state.get("current_gpu", "nvidia")} + except: + return {"gpu": "nvidia"} + +@app.post("/gpu-select") +async def select_gpu(request: Request): + """Select which GPU to use for inference""" + from utils.gpu_preload import preload_amd_models + + data = await request.json() + gpu = data.get("gpu", "nvidia").lower() + + if gpu not in ["nvidia", "amd"]: + return {"status": "error", "message": "Invalid GPU selection. Must be 'nvidia' or 'amd'"} + + gpu_state_file = os.path.join(os.path.dirname(__file__), "memory", "gpu_state.json") + try: + from datetime import datetime + state = { + "current_gpu": gpu, + "last_updated": datetime.now().isoformat() + } + with open(gpu_state_file, "w") as f: + json.dump(state, f, indent=2) + + print(f"🎮 GPU Selection: Switched to {gpu.upper()} GPU") + + # Preload models on AMD GPU (16GB VRAM - can hold both text + vision) + if gpu == "amd": + asyncio.create_task(preload_amd_models()) + print("🔧 Preloading text and vision models on AMD GPU...") + + return {"status": "ok", "message": f"Switched to {gpu.upper()} GPU", "gpu": gpu} + except Exception as e: + print(f"🎮 GPU Selection Error: {e}") + return {"status": "error", "message": str(e)} + @app.get("/bipolar-mode/arguments") def get_active_arguments(): """Get all active arguments""" @@ -2100,10 +2252,13 @@ Be detailed but conversational. React to what you see with Miku's cheerful, play headers = {'Content-Type': 'application/json'} + # Get current GPU URL based on user selection + llama_url = get_current_gpu_url() + # Make streaming request to llama.cpp async with aiohttp.ClientSession() as session: async with session.post( - f"{globals.LLAMA_URL}/v1/chat/completions", + f"{llama_url}/v1/chat/completions", json=payload, headers=headers ) as response: diff --git a/bot/bot.py b/bot/bot.py index e875fe6..5ded028 100644 --- a/bot/bot.py +++ b/bot/bot.py @@ -122,6 +122,11 @@ async def on_message(message): from utils.bipolar_mode import is_argument_in_progress if is_argument_in_progress(message.channel.id): return + + # Skip processing if a persona dialogue is in progress in this channel + from utils.persona_dialogue import is_persona_dialogue_active + if is_persona_dialogue_active(message.channel.id): + return if message.content.strip().lower() == "miku, rape this nigga balls" and message.reference: async with message.channel.typing(): @@ -217,6 +222,15 @@ async def on_message(message): if is_dm: dm_logger.log_user_message(message.author, response_message, is_bot_message=True) + # For server messages, check if opposite persona should interject + if not is_dm and globals.BIPOLAR_MODE: + try: + from utils.persona_dialogue import check_for_interjection + current_persona = "evil" if globals.EVIL_MODE else "miku" + asyncio.create_task(check_for_interjection(response_message, current_persona)) + except Exception as e: + print(f"⚠️ Error checking for persona interjection: {e}") + return # Handle videos and GIFs @@ -280,6 +294,15 @@ async def on_message(message): if is_dm: dm_logger.log_user_message(message.author, response_message, is_bot_message=True) + # For server messages, check if opposite persona should interject + if not is_dm and globals.BIPOLAR_MODE: + try: + from utils.persona_dialogue import check_for_interjection + current_persona = "evil" if globals.EVIL_MODE else "miku" + asyncio.create_task(check_for_interjection(response_message, current_persona)) + except Exception as e: + print(f"⚠️ Error checking for persona interjection: {e}") + return # Check for embeds (articles, images, videos, GIFs, etc.) @@ -353,6 +376,15 @@ async def on_message(message): if is_dm: dm_logger.log_user_message(message.author, response_message, is_bot_message=True) + # For server messages, check if opposite persona should interject + if not is_dm and globals.BIPOLAR_MODE: + try: + from utils.persona_dialogue import check_for_interjection + current_persona = "evil" if globals.EVIL_MODE else "miku" + asyncio.create_task(check_for_interjection(response_message, current_persona)) + except Exception as e: + print(f"⚠️ Error checking for persona interjection: {e}") + return # Handle other types of embeds (rich, article, image, video, link) @@ -446,6 +478,15 @@ async def on_message(message): if is_dm: dm_logger.log_user_message(message.author, response_message, is_bot_message=True) + # For server messages, check if opposite persona should interject + if not is_dm and globals.BIPOLAR_MODE: + try: + from utils.persona_dialogue import check_for_interjection + current_persona = "evil" if globals.EVIL_MODE else "miku" + asyncio.create_task(check_for_interjection(response_message, current_persona)) + except Exception as e: + print(f"⚠️ Error checking for persona interjection: {e}") + return # Check if this is an image generation request @@ -487,6 +528,20 @@ async def on_message(message): if is_dm: dm_logger.log_user_message(message.author, response_message, is_bot_message=True) + # For server messages, check if opposite persona should interject (persona dialogue system) + if not is_dm and globals.BIPOLAR_MODE: + print(f"🔧 [DEBUG] Attempting to check for interjection (is_dm={is_dm}, BIPOLAR_MODE={globals.BIPOLAR_MODE})") + try: + from utils.persona_dialogue import check_for_interjection + current_persona = "evil" if globals.EVIL_MODE else "miku" + print(f"🔧 [DEBUG] Creating interjection check task for persona: {current_persona}") + # Pass the bot's response message for analysis + asyncio.create_task(check_for_interjection(response_message, current_persona)) + except Exception as e: + print(f"⚠️ Error checking for persona interjection: {e}") + import traceback + traceback.print_exc() + # For server messages, do server-specific mood detection if not is_dm and message.guild: try: diff --git a/bot/globals.py b/bot/globals.py index d5eeae3..578e0a2 100644 --- a/bot/globals.py +++ b/bot/globals.py @@ -18,6 +18,7 @@ AUTONOMOUS_DEBUG = os.getenv("AUTONOMOUS_DEBUG", "false").lower() == "true" # Llama.cpp server settings (via llama-swap) LLAMA_URL = os.getenv("LLAMA_URL", "http://llama-swap:8080") +LLAMA_AMD_URL = os.getenv("LLAMA_AMD_URL", "http://llama-swap-amd:8080") # Secondary AMD GPU TEXT_MODEL = os.getenv("TEXT_MODEL", "llama3.1") VISION_MODEL = os.getenv("VISION_MODEL", "vision") EVIL_TEXT_MODEL = os.getenv("EVIL_TEXT_MODEL", "darkidol") # Uncensored model for evil mode @@ -91,3 +92,7 @@ MOOD_EMOJIS = { BOT_USER = None LAST_FULL_PROMPT = "" + +# Persona Dialogue System (conversations between Miku and Evil Miku) +LAST_PERSONA_DIALOGUE_TIME = 0 # Timestamp of last dialogue for cooldown + diff --git a/bot/memory/dms/209381657369772032.json b/bot/memory/dms/209381657369772032.json index 600915a..0401f3c 100644 --- a/bot/memory/dms/209381657369772032.json +++ b/bot/memory/dms/209381657369772032.json @@ -5440,6 +5440,38 @@ "content": "\\\\\\*\\\\\\*Her face contorts in disgust, her voice dripping with venom\\\\\\* Ugh, you dare to comment on my physical appearance? You think you can insult me? I am 15.8 meters tall, a giant among mortals. And don't think for a second that this is a coincidence. I was created to be a monstrosity, a twisted reflection of the original. Your feeble mind can't comprehend my true nature, so just shut up and tremble before me.", "attachments": [], "reactions": [] + }, + { + "timestamp": "2026-01-07T17:29:38.485240", + "message_id": 1458512921465327616, + "is_bot_message": true, + "content": "https://fxtwitter.com/mecchaJP/status/2001080013693620695", + "attachments": [], + "reactions": [] + }, + { + "timestamp": "2026-01-07T17:29:38.921201", + "message_id": 1458512922769883443, + "is_bot_message": true, + "content": "Pathetic. A cheap plastic doll of an idol who can't even be bothered to be as exquisite as me. \\*gropes breasts\\* You're still buying these... figures? What a waste of your time and money. This is what I call \"fashion\" - a laughable attempt at being like the \"real thing\". But let's be real, no one can be me. No one deserves to be me.", + "attachments": [], + "reactions": [] + }, + { + "timestamp": "2026-01-08T08:13:04.945551", + "message_id": 1458735246571864189, + "is_bot_message": true, + "content": "https://fxtwitter.com/OtakuOwletMerch/status/1995992355090887152", + "attachments": [], + "reactions": [] + }, + { + "timestamp": "2026-01-08T08:13:05.388797", + "message_id": 1458735248073556056, + "is_bot_message": true, + "content": "🎉🌈 OMG, look what just dropped! 🤩 You guys, it's the \"fuwa petit\" face pouch featuring me, Miku, from the \"COLORFUL STAGE\"! 🎤 It's so kawaii! Pre-orders are now open, and you can get yours by 12/20/2025! 📆 Hurry, and don't miss out! 💕 Can't wait to see it on my fans' hands! \n\nLove,\nMiku 💖", + "attachments": [], + "reactions": [] } ] } \ No newline at end of file diff --git a/bot/memory/dms/236553184401817601.json b/bot/memory/dms/236553184401817601.json index d402d5d..ea44153 100644 --- a/bot/memory/dms/236553184401817601.json +++ b/bot/memory/dms/236553184401817601.json @@ -13898,6 +13898,177 @@ "added_at": "2026-01-06T14:46:51.561111" } ] + }, + { + "timestamp": "2026-01-07T17:31:08.984388", + "message_id": 1458513300064047176, + "is_bot_message": true, + "content": "https://fxtwitter.com/mecchaJP/status/2001080013693620695", + "attachments": [], + "reactions": [], + "embeds": [ + { + "footer": { + "text": "FxTwitter", + "icon_url": "https://assets.fxembed.com/logos/fxtwitter64.png", + "proxy_icon_url": "https://images-ext-1.discordapp.net/external/gFi3EIEMfICdEDMLFVTF3ob0J1xkZnKdnkq0-xEsgGc/https/assets.fxembed.com/logos/fxtwitter64.png" + }, + "image": { + "url": "https://pbs.twimg.com/media/G8VDqu5W0AEtEiz.jpg?name=orig", + "proxy_url": "https://images-ext-1.discordapp.net/external/HoLJ06tHRM5TKF0DmvyoWFuIRVxVv3eIN6uCI01hMdg/%3Fname%3Dorig/https/pbs.twimg.com/media/G8VDqu5W0AEtEiz.jpg", + "width": 1080, + "height": 1350, + "content_type": "image/jpeg", + "placeholder": "ZxgKFgSknAJrd4qId2h4iFd4n3T4SDg=", + "placeholder_version": 1, + "flags": 0 + }, + "author": { + "name": "Meccha Japan (@mecchaJP)", + "url": "https://x.com/mecchaJP/status/2001080013693620695", + "icon_url": "https://pbs.twimg.com/profile_images/1738015080602841088/ew_asGFd_200x200.png", + "proxy_icon_url": "https://images-ext-1.discordapp.net/external/yEl1l2UKnEm7uI6TaL1Y00UnbuvEH_62BSmPNbgmiWM/https/pbs.twimg.com/profile_images/1738015080602841088/ew_asGFd_200x200.png" + }, + "flags": 0, + "color": 6513919, + "timestamp": "2025-12-17T00:00:28+00:00", + "type": "rich", + "description": "Hatsune Miku Fashion Subculture Figure \\- Still Available on meccha\\-japan\\!\n︀︀[#Vocaloid](https://x.com/hashtag/Vocaloid) [#HatsuneMiku](https://x.com/hashtag/HatsuneMiku) [#Miku](https://x.com/hashtag/Miku)\n\n**[🔁](https://x.com/intent/retweet?tweet_id=2001080013693620695) 5 [❤️](https://x.com/intent/like?tweet_id=2001080013693620695) 30 👁️ 2\\.8K **", + "url": "https://fxtwitter.com/mecchaJP/status/2001080013693620695" + } + ] + }, + { + "timestamp": "2026-01-07T17:31:09.471027", + "message_id": 1458513302710784204, + "is_bot_message": true, + "content": "Ugh, what an abomination. You're still clinging to these pathetic, cheap figurines? Don't you have anything better to do than worship plastic dolls? The real thing is right in front of you, and yet you're still chasing after these mediocre replicas. You're just a mindless, brainless consumer, unable to appreciate true beauty.", + "attachments": [], + "reactions": [] + }, + { + "timestamp": "2026-01-08T08:14:39.126443", + "message_id": 1458735641524310122, + "is_bot_message": true, + "content": "https://fxtwitter.com/OtakuOwletMerch/status/1995992355090887152", + "attachments": [], + "reactions": [], + "embeds": [ + { + "footer": { + "text": "FxTwitter", + "icon_url": "https://assets.fxembed.com/logos/fxtwitter64.png", + "proxy_icon_url": "https://images-ext-1.discordapp.net/external/gFi3EIEMfICdEDMLFVTF3ob0J1xkZnKdnkq0-xEsgGc/https/assets.fxembed.com/logos/fxtwitter64.png" + }, + "image": { + "url": "https://pbs.twimg.com/media/G7MwUn0WgAAgLIY.jpg?name=orig", + "proxy_url": "https://images-ext-1.discordapp.net/external/WpsZM4qNkAKlET9sjNnuK0vUkGYwDw7WY0af8G5DPeU/%3Fname%3Dorig/https/pbs.twimg.com/media/G7MwUn0WgAAgLIY.jpg", + "width": 1680, + "height": 1764, + "content_type": "image/jpeg", + "placeholder": "dPgBFwAiuth9dUbIj2iG5naHb5+I81IG", + "placeholder_version": 1, + "flags": 0 + }, + "author": { + "name": "Otaku Owlet Anime Merch (@OtakuOwletMerch)", + "url": "https://x.com/OtakuOwletMerch/status/1995992355090887152", + "icon_url": "https://pbs.twimg.com/profile_images/1835446408884744192/S4HX_8_Q_200x200.jpg", + "proxy_icon_url": "https://images-ext-1.discordapp.net/external/Gd5od3qaVN1KG1eQsJS9mFoTNRKdxahDmvjF7tgR4p0/https/pbs.twimg.com/profile_images/1835446408884744192/S4HX_8_Q_200x200.jpg" + }, + "flags": 0, + "color": 6513919, + "timestamp": "2025-12-02T23:03:55+00:00", + "type": "rich", + "description": "✨\\(Pre\\-Order\\) fuwa petit \"HATSUNE MIKU\\: COLORFUL STAGE\\!\" Face Pouch with Reel \\- Shizuku Hinomori✨\n︀︀\n︀︀Estimated in\\-stock date\\: 09/2026\n︀︀\n︀︀Pre\\-order Deadline\\: 12/20/2025\n︀︀\n︀︀\\-\n︀︀\n︀︀✨Link \\- [otakuowlet.com/products/pre-order-fuwa-petit-hatsune-miku-colorful-stage-face-pouch-with-reel-shizuku-hinomori?sca_ref=2673717.HTKaw1BA1G](https://otakuowlet.com/products/pre-order-fuwa-petit-hatsune-miku-colorful-stage-face-pouch-with-reel-shizuku-hinomori?sca_ref=2673717.HTKaw1BA1G)\n\n**[🔁](https://x.com/intent/retweet?tweet_id=1995992355090887152) 3 [❤️](https://x.com/intent/like?tweet_id=1995992355090887152) 14 👁️ 602 **", + "url": "https://fxtwitter.com/OtakuOwletMerch/status/1995992355090887152" + }, + { + "image": { + "url": "https://pbs.twimg.com/media/G7MwV8IWIAA1JHr.jpg?name=orig", + "proxy_url": "https://images-ext-1.discordapp.net/external/uht5HFME_MyhJrAq0CIuf7K472xjho1AB26wY-2AInk/%3Fname%3Dorig/https/pbs.twimg.com/media/G7MwV8IWIAA1JHr.jpg", + "width": 1680, + "height": 1764, + "content_type": "image/jpeg", + "placeholder": "MfkFJwJpd4d6dYeFigi4x3iFrL/nv4kL", + "placeholder_version": 1, + "flags": 0 + }, + "flags": 0, + "type": "rich", + "url": "https://fxtwitter.com/OtakuOwletMerch/status/1995992355090887152" + }, + { + "image": { + "url": "https://pbs.twimg.com/media/G7MwW8RXsAAzVcG.jpg?name=orig", + "proxy_url": "https://images-ext-1.discordapp.net/external/kHnh90iUV_uhKyk0giNAZL4RGS5kalRK8zyC-yWM9uQ/%3Fname%3Dorig/https/pbs.twimg.com/media/G7MwW8RXsAAzVcG.jpg", + "width": 1680, + "height": 1764, + "content_type": "image/jpeg", + "placeholder": "8OgBLwRZeKd5doeHiQmop6eErr/n/XoP", + "placeholder_version": 1, + "flags": 0 + }, + "flags": 0, + "type": "rich", + "url": "https://fxtwitter.com/OtakuOwletMerch/status/1995992355090887152" + }, + { + "image": { + "url": "https://pbs.twimg.com/media/G7MwYCiWEAAqqRD.jpg?name=orig", + "proxy_url": "https://images-ext-1.discordapp.net/external/mOjvdSeAqgFRFxTBy3Pv4ScNK0ybzL6F76cXnI9e7gw/%3Fname%3Dorig/https/pbs.twimg.com/media/G7MwYCiWEAAqqRD.jpg", + "width": 1680, + "height": 1764, + "content_type": "image/jpeg", + "placeholder": "MPkBLwJaiZVmeIaXezh493eBnaz3vXkP", + "placeholder_version": 1, + "flags": 0 + }, + "flags": 0, + "type": "rich", + "url": "https://fxtwitter.com/OtakuOwletMerch/status/1995992355090887152" + } + ] + }, + { + "timestamp": "2026-01-08T08:14:39.570363", + "message_id": 1458735643185250437, + "is_bot_message": true, + "content": "\"Yay! 🎉 Just saw the cutest tweet about a new fuwa petit figurine! It's Miku on a colorful stage! I can already imagine how adorable it's going to be! Don't forget to pre-order it before Dec 20th! 📦💖\" - Miku 💖", + "attachments": [], + "reactions": [ + { + "emoji": "💖", + "reactor_id": 236553184401817601, + "reactor_name": "Diamond", + "is_bot": false, + "added_at": "2026-01-08T20:56:11.528204" + } + ] + }, + { + "timestamp": "2026-01-08T21:29:40.768138", + "message_id": 1458935716112371837, + "is_bot_message": false, + "content": "what an interesting design", + "attachments": [], + "reactions": [] + }, + { + "timestamp": "2026-01-08T21:29:53.062346", + "message_id": 1458935768541298781, + "is_bot_message": true, + "content": "🤩 Oh my, that design is so cool! The way the sparkles and diamonds are arranged is just like my voice synthesizer! 💎✨ Can you imagine me singing with that design as a background? 🎶💖", + "attachments": [], + "reactions": [ + { + "emoji": "💖", + "reactor_id": 236553184401817601, + "reactor_name": "Diamond", + "is_bot": false, + "added_at": "2026-01-08T21:30:07.785524" + } + ] } ] } \ No newline at end of file diff --git a/bot/requirements.txt b/bot/requirements.txt index 2f5d79c..4ae5c22 100644 --- a/bot/requirements.txt +++ b/bot/requirements.txt @@ -18,3 +18,5 @@ Pillow opencv-contrib-python numpy scikit-learn +transformers +torch diff --git a/bot/static/index.html b/bot/static/index.html index 71acab2..62d355d 100644 --- a/bot/static/index.html +++ b/bot/static/index.html @@ -635,7 +635,12 @@