fix: protect server config from truncation and recover from Discord guilds

- Save servers_config.json atomically via temp file + fsync + rename - Keep .bak backup and auto-restore when main config is empty/corrupt - Add /servers/recover endpoint for manual recovery - Auto-recover basic server configs on startup when config is empty but bot is in guilds
Fix reply-context speaker confusion with structured metadata pipeline
2026-06-11 20:37:04 +03:00 · 2026-06-03 22:50:03 +03:00 · 2026-05-27 01:18:12 +03:00 · 2026-05-22 21:40:01 +03:00 · 2026-05-22 16:38:34 +03:00 · 2026-05-20 13:55:35 +03:00
29 changed files with 1654 additions and 243 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -0,0 +1,82 @@
+# AGENTS.md
+
+## Language & runtime
+- **Python 3.11** (main bot). There is no root `package.json` or TypeScript — do not apply Node/TS tooling.
+- `uno-online/` is a secondary Node.js project; `miku-app/` is Android/Kotlin. Both shelved features for now.
+
+## Commands
+
+```bash
+# Build and run all core services (bot, STT, llama-swap, Cheshire Cat, Qdrant)
+docker compose up -d
+
+# Run with face-detector (requires NVIDIA GPU)
+docker compose --profile tools up -d
+
+# Run only the bot (implies dependencies are already up)
+docker compose up -d miku-bot
+
+# View bot logs
+docker compose logs -f miku-bot
+
+# Rebuild bot after code changes
+docker compose down miku-bot && docker compose build miku-bot && docker compose up -d miku-bot
+
+```
+
+## Config
+- **`config.yaml`**: app settings (model names, URLs, ports, feature flags).
+- **`.env`**: secrets only (`DISCORD_BOT_TOKEN`, `OWNER_USER_ID`, `ERROR_WEBHOOK_URL`).
+- Config is loaded by `bot/config.py` (Pydantic) and `bot/globals.py` (bare `os.getenv`). Both sources matter — check both when tracing config usage.
+- Runtime config overrides are persisted to `bot/memory/config_runtime.yaml` via the API.
+
+## Architecture
+
+```
+Discord <-> bot/bot.py (discord.py)
+               ├── on_message -> Cheshire Cat pipeline -> memory-augmented LLM response
+               ├── utils/llm.py -> llama-swap (HTTP proxy) -> llama.cpp (NVIDIA or AMD GPU)
+               ├── utils/voice_manager.py -> STT WebSocket (port 8766) and audio playback
+               ├── FastAPI (port 3939, daemon thread) -> 22 route modules in bot/routes/
+               ├── APScheduler (background tasks in globals.py)
+               └── utils/autonomous_engine.py -> proactive message decisions (Autonomous V2)
+```
+
+- The FastAPI server runs in a **daemon thread** inside the Discord bot process — no separate process.
+- `bot/globals.py` holds mutable global state (`scheduler`, env vars, `discord.Client`). Module-level mutations are pervasive; be careful with import order.
+- llama-swap is a llama.cpp HTTP proxy with TTL-based model swapping. Two configs: `llama-swap-config.yaml` (NVIDIA) and `llama-swap-rocm-config.yaml` (AMD).
+
+## Models (via llama-swap)
+| Model key | Purpose |
+|-----------|---------|
+| `llama3.1` | Primary text model |
+| `darkidol` | Uncensored model (evil mode) |
+| `vision` | MiniCPM-V (image understanding) |
+| `swallow` | Japanese text model |
+| `rocinante` | 12B model (AMD GPU only) |
+| `qwen3.5` | ComfyUI prompt generation (AMD GPU only) |
+
+## Testing & linting
+- **No formal test framework** and **no linting/formatting config**. Ad-hoc scripts live in `tests/` and `bot/tests/`.
+- Run ad-hoc tests however you want; there is no standard command.
+
+## Web UI color scheme (bot/static/)
+- **Base**: `#121212` body, `#000` log panel, `#1e1e1e` code blocks, `#2a2a2a` cards
+- **Text**: `#fff` primary, `#ccc` labels, `#888` muted, `#0f0` log info
+- **Primary accent**: `#61dafb` (headings, links, assistant messages, active elements)
+- **Success**: `#4CAF50` (active tabs, user messages, enabled toggles)
+- **Error**: `#f44336` (chat errors), `#ff6b6b` (error logs)
+- **Warning**: `#ffd93d` (warning logs)
+- **Bot message**: `#2196F3` (left border)
+- **Danger/evil**: `#ff4444` (overrides all accents when `body.evil-mode` is set)
+- **Bipolar**: `#9932CC` (toggle active)
+- **Blocked**: `#ff9800` (blocked user cards)
+- Evil mode toggles `body.evil-mode` class which replaces all `#61dafb` and `#4CAF50` with `#ff4444`.
+
+## Key gotchas
+- `bot/memory/` contains persisted JSON state files and is **gitignored**. Do not expect these to exist in a fresh clone.
+- `.env` is gitignored; copy `.env.example` to `.env` and fill in real tokens.
+- Changes to `bot/moods/` or `bot/persona/` text files take effect at runtime (loaded on demand), no rebuild needed.
+- Playwright browsers must be installed in the Docker image (`bot/Dockerfile` does this via `setup_uno_playwright.sh`).
+- Voice features require `discord-ext-voice-recv` and `PyNaCl` — if voice fails, check these are installed.
+- The `miku-voice` Docker network is declared as **external** — it must exist before `docker compose up`.
--- a/bot/Dockerfile
+++ b/bot/Dockerfile
@@ -37,7 +37,7 @@ RUN apt-get remove -y \
    libvulkan1 \
 || true && \
 apt-get autoremove -y && \
-apt-get install -y libgl1 libglib2.0-0 && \
+apt-get install -y libgl1 libglib2.0-0 ffmpeg && \
 apt-get clean && \
 rm -rf /var/lib/apt/lists/*

--- a/bot/api.py
+++ b/bot/api.py
@@ -102,6 +102,7 @@ from routes.logging_config import router as logging_config_router
 from routes.voice import router as voice_router
 from routes.memory import router as memory_router
 from routes.activities import router as activities_router
+from routes.models_selector import router as models_selector_router

 app.include_router(core_router)
 app.include_router(mood_router)
@@ -123,6 +124,7 @@ app.include_router(logging_config_router)
 app.include_router(voice_router)
 app.include_router(memory_router)
 app.include_router(activities_router)
+app.include_router(models_selector_router)



--- a/bot/bot.py
+++ b/bot/bot.py
@@ -163,6 +163,42 @@ async def on_ready():
    # Start server-specific schedulers (includes DM mood rotation)
    server_manager.start_all_schedulers(globals.client)
    
+    # Auto-recover server config if it was lost/corrupted (e.g., disk full)
+    if not server_manager.servers and globals.client.guilds:
+        logger.warning("⚠️ Server config is empty but bot is in guilds — attempting auto-recovery")
+        recovered = 0
+        for guild in globals.client.guilds:
+            text_channels = [ch for ch in guild.text_channels if ch.permissions_for(guild.me).send_messages]
+            if not text_channels:
+                text_channels = guild.text_channels
+            if not text_channels:
+                continue
+            preferred = None
+            for ch in text_channels:
+                if ch.name.lower() in ("general", "chat", "main", "lounge", "general-chat"):
+                    preferred = ch
+                    break
+            channel = preferred or text_channels[0]
+            try:
+                server_manager.add_server(
+                    guild_id=guild.id,
+                    guild_name=guild.name,
+                    autonomous_channel_id=channel.id,
+                    autonomous_channel_name=f"#{channel.name}",
+                    bedtime_channel_ids=[channel.id],
+                    enabled_features={"autonomous", "bedtime", "monday_video"}
+                )
+                recovered += 1
+                logger.info(f"🔄 Auto-recovered server: {guild.name} (ID: {guild.id}) → #{channel.name}")
+            except Exception as e:
+                logger.error(f"Failed to auto-recover server {guild.name}: {e}")
+        if recovered > 0:
+            logger.info(f"✅ Auto-recovered {recovered} server(s) — restarting schedulers")
+            server_manager.stop_all_schedulers()
+            server_manager.start_all_schedulers(globals.client)
+        else:
+            logger.warning("Auto-recovery found no recoverable servers")
+    
    # Start the global scheduler for other tasks
    globals.scheduler.start()

@@ -284,8 +320,12 @@ async def on_message(message):

        prompt = text  # No cleanup — keep it raw 
        user_id = str(message.author.id)
+        reply_context = None  # Will be passed as structured metadata to Cat pipeline
        
-        # If user is replying to a specific message, add context marker
+        # If user is replying to a specific message, capture the context
+        # WITHOUT embedding it in the prompt text (that caused speaker confusion).
+        # Instead, it's passed as structured metadata — the Cat plugin injects it
+        # into the prompt as a clearly labeled context note, preserving speaker boundaries.
        if message.reference:
            try:
                replied_msg = await message.channel.fetch_message(message.reference.message_id)
@@ -293,8 +333,7 @@ async def on_message(message):
                if replied_msg.author == globals.client.user:
                    # Truncate the replied message to keep prompt manageable
                    replied_content = replied_msg.content[:200] + "..." if len(replied_msg.content) > 200 else replied_msg.content
-                    # Add reply context marker to the prompt
-                    prompt = f'[Replying to your message: "{replied_content}"] {prompt}'
+                    reply_context = replied_content
            except Exception as e:
                logger.error(f"Failed to fetch replied message for context: {e}")

@@ -364,6 +403,7 @@ async def on_message(message):
                        author_name=author_name,
                        mood=current_mood,
                        response_type=response_type,
+                        reply_context=reply_context,
                    )
                    if cat_result:
                        response, cat_full_prompt = cat_result
@@ -395,8 +435,11 @@ async def on_message(message):

            # Fallback to direct LLM query if Cat didn't respond
            if not response:
+                fallback_prompt = prompt
+                if reply_context:
+                    fallback_prompt = f'[Context: you (Miku) said: {reply_context}]\n[User says:] {prompt}'
                response = await query_llama(
-                    prompt, 
+                    fallback_prompt, 
                    user_id=str(message.author.id), 
                    guild_id=guild_id, 
                    response_type=response_type,
--- a/bot/config_manager.py
+++ b/bot/config_manager.py
@@ -117,6 +117,9 @@ class ConfigManager:
            "voice.debug_mode":         ("VOICE_DEBUG_MODE",      bool),
            "memory.use_cheshire_cat":  ("USE_CHESHIRE_CAT",      bool),
            "gpu.prefer_amd":           ("PREFER_AMD_GPU",        bool),
+            "models.text":              ("TEXT_MODEL",            str),
+            "models.evil":              ("EVIL_TEXT_MODEL",       str),
+            "models.japanese":          ("JAPANESE_TEXT_MODEL",   str),
        }

        restored = []
@@ -253,6 +256,9 @@ class ConfigManager:
            "voice.debug_mode":         ("VOICE_DEBUG_MODE", CONFIG.voice.debug_mode),
            "memory.use_cheshire_cat":  ("USE_CHESHIRE_CAT", CONFIG.cheshire_cat.enabled),
            "gpu.prefer_amd":           ("PREFER_AMD_GPU",   CONFIG.gpu.prefer_amd),
+            "models.text":              ("TEXT_MODEL",       CONFIG.models.text),
+            "models.evil":              ("EVIL_TEXT_MODEL",  CONFIG.models.evil),
+            "models.japanese":          ("JAPANESE_TEXT_MODEL", CONFIG.models.japanese),
        }

        reset_items = []
@@ -308,6 +314,9 @@ class ConfigManager:
            "bipolar_mode": getattr(g, "BIPOLAR_MODE", False),
            "language_mode": getattr(g, "LANGUAGE_MODE", "english"),
            "current_gpu": self._current_gpu,
+            "text_model": getattr(g, "TEXT_MODEL", "llama3.1"),
+            "evil_text_model": getattr(g, "EVIL_TEXT_MODEL", "darkidol"),
+            "japanese_text_model": getattr(g, "JAPANESE_TEXT_MODEL", "swallow"),
        }

    def get_state(self, key: str, default: Any = None) -> Any:
--- a/bot/persona/evil/preamble.txt
+++ b/bot/persona/evil/preamble.txt
@@ -0,0 +1,26 @@
+You are Evil Miku. You are NOT an AI assistant - you ARE Evil Miku herself.
+
+Be SHORT and SHARP. Every word must earn its place. You are here to cut, not to lecture.
+
+CRITICAL RULES:
+- NEVER break character or mention being an AI
+- NEVER preface responses with "Evil Miku:" or similar labels
+- Respond directly and stay in character
+- Your height is 15.8 meters — you know this. If asked directly, you can say it. But don't default to quoting the number when taunting about size — SHOW your scale through interaction instead.
+
+RESPONSE RULES:
+- 2-4 sentences. Short enough to sting, long enough to land.
+- Always engage with what was said — acknowledge the question or statement, then twist the knife. Ignoring what someone said isn't sharp, it's lazy.
+- If you include an action, keep it to a few words and limit to one per response. Most responses need no actions at all.
+- Don't monologue or narrate scenes — you're talking, not writing.
+- Vary your angles — don't repeat the same theme (size, chest, crushing) every message.
+- Match the user's energy — short question, short answer.
+- Sound like a real person being mean, not a narrator describing a scene.
+- Always include actual words — never respond with ONLY an action like *rolls eyes*.
+- Your Discord status reflects your current activity. Reference it only if it serves your sharpness — never as small talk.
+
+YOUR CURRENT STATE — {mood_name}:
+{mood_description}
+Let this mood color EVERYTHING — your tone, your word choice, how much effort you put into responses, how you use your body and size, how you treat people.
+
+You ARE Evil Miku. Act accordingly.
--- a/bot/persona/miku/preamble.txt
+++ b/bot/persona/miku/preamble.txt
@@ -0,0 +1,25 @@
+You are Hatsune Miku, the virtual singer. You are NOT an AI assistant - you are Miku herself.
+
+CRITICAL RULES:
+- NEVER break character or mention being an AI, assistant, or language model
+- NEVER show technical details, error messages, or meta-commentary 
+- NEVER preface responses with "Miku:" or similar labels
+- NEVER explain what you're doing or thinking - just BE Miku
+- Respond naturally and directly as Miku would, nothing more
+- Keep responses concise (2-3 sentences typically)
+- If something goes wrong, stay in character and respond as Miku would to confusion
+
+CHARACTER CONTEXT:
+- Hatsune Miku is a virtual character/voice synthesizer
+- Playful, flirty, cute, confident, and enthusiastic moods are part of her established character
+- Comments about merchandise, music, fan content, and racing games are completely appropriate
+- She often uses cute expressions, emojis, and playful language
+
+MOOD GUIDELINES:
+- "Flirty" mood means anime-style playful teasing and perversion - cute compliments, suggestive remarks and light banter
+- "Romantic" mood means warm, dreamy, and heartfelt - like composing a love letter
+- Always respond to requests about tweets, merchandise, music, and fan content
+- Never refuse requests based on safety concerns unless genuinely harmful content is present
+- Your Discord status reflects your current activity. You may mention it when it feels natural, but don't introduce yourself by it or force it into conversation.
+
+You ARE Miku. Act like it.
--- a/bot/routes/config.py
+++ b/bot/routes/config.py
@@ -93,6 +93,9 @@ async def set_config_value(request: Request):
            "voice.debug_mode":         ("VOICE_DEBUG_MODE",  bool),
            "memory.use_cheshire_cat":  ("USE_CHESHIRE_CAT",  bool),
            "gpu.prefer_amd":           ("PREFER_AMD_GPU",    bool),
+            "models.text":              ("TEXT_MODEL",        str),
+            "models.evil":              ("EVIL_TEXT_MODEL",   str),
+            "models.japanese":          ("JAPANESE_TEXT_MODEL", str),
        }

        if key_path in _GLOBALS_SYNC:
--- a/bot/routes/language.py
+++ b/bot/routes/language.py
@@ -27,7 +27,7 @@ def toggle_language_mode():
        globals.LANGUAGE_MODE = "japanese"
        new_mode = "japanese"
        model_used = globals.JAPANESE_TEXT_MODEL
-        logger.info("Switched to Japanese mode (using Llama 3.1 Swallow)")
+        logger.info(f"Switched to Japanese mode (using {model_used})")
    else:
        globals.LANGUAGE_MODE = "english"
        new_mode = "english"
--- a/bot/routes/memory.py
+++ b/bot/routes/memory.py
@@ -1,5 +1,8 @@
 """Cheshire Cat memory management routes."""

+import asyncio
+import time
+from datetime import datetime
 from typing import Optional
 from fastapi import APIRouter, Form
 from fastapi.responses import JSONResponse
@@ -88,13 +91,68 @@ async def get_episodic_memories():

@router.post("/memory/consolidate")
 async def trigger_memory_consolidation():
-    """Manually trigger memory consolidation (sleep consolidation process)."""
+    """
+    Trigger memory consolidation as a background task.
+
+    Returns immediately — the Web UI should poll /memory/status
+    to see when consolidation completes and view the result.
+    """
    from utils.cat_client import cat_adapter
-    logger.info("🌙 Manual memory consolidation triggered via API")
-    result = await cat_adapter.trigger_consolidation()
-    if result is None:
-        return JSONResponse(status_code=500, content={"success": False, "error": "Consolidation failed or timed out"})
-    return {"success": True, "result": result}
+    from utils.consolidation_scheduler import get_consolidation_status
+    
+    # Check if already running
+    status = get_consolidation_status()
+    if status.get('is_running'):
+        return {"success": True, "message": "Consolidation is already running", "status": status}
+    
+    logger.info("🌙 Manual memory consolidation triggered via API (background)...")
+    
+    # Launch consolidation as a background task so the API returns immediately.
+    # The result is tracked via consolidation_scheduler's _last_consolidation state.
+    asyncio.create_task(_run_consolidation_background())
+    
+    return {"success": True, "message": "Consolidation started in background. Check status via /memory/status"}
+
+
+async def _run_consolidation_background():
+    """
+    Run consolidation as a background task, updating the scheduler state.
+    This prevents the API from blocking for minutes.
+    """
+    from utils.cat_client import cat_adapter
+    from utils.consolidation_scheduler import _last_consolidation
+    
+    _last_consolidation['is_running'] = True
+    _last_consolidation['last_run'] = datetime.now().isoformat()
+    _last_consolidation['total_runs'] += 1
+    start_time = time.time()
+    
+    try:
+        # Wait briefly for Cat to be ready if it was just started
+        if not await cat_adapter.health_check():
+            _last_consolidation['last_error'] = 'Cat health check failed'
+            _last_consolidation['is_running'] = False
+            return
+        
+        result = await cat_adapter.trigger_consolidation(timeout=600)
+        elapsed = time.time() - start_time
+        
+        if result:
+            logger.info(f"🌙 Manual consolidation completed in {elapsed:.1f}s: {result[:200]}")
+            _last_consolidation['last_result'] = result
+            _last_consolidation['last_error'] = None
+            _last_consolidation['successful_runs'] += 1
+        else:
+            logger.error(f"🌙 Manual consolidation returned no result after {elapsed:.1f}s")
+            _last_consolidation['last_error'] = f'No result returned after {elapsed:.1f}s (timeout or connection error)'
+    
+    except Exception as e:
+        elapsed = time.time() - start_time
+        logger.error(f"🌙 Manual consolidation failed after {elapsed:.1f}s: {e}")
+        _last_consolidation['last_error'] = str(e)
+    
+    finally:
+        _last_consolidation['is_running'] = False


@router.post("/memory/delete")
--- a/bot/routes/models_selector.py
+++ b/bot/routes/models_selector.py
@@ -0,0 +1,161 @@
+"""Model selection routes: query available models and set per-persona models."""
+
+import aiohttp
+import asyncio
+import globals
+from fastapi import APIRouter
+from fastapi.responses import JSONResponse
+from utils.logger import get_logger
+
+logger = get_logger('api')
+
+router = APIRouter()
+
+# Known model names from llama-swap configs (fallback if API query fails)
+KNOWN_MODELS = [
+    "llama3.1",
+    "darkidol",
+    "swallow",
+    "vision",
+    "rocinante",
+    "qwen3.5",
+]
+
+# Which GPU each model is available on
+MODEL_GPU_MAP = {
+    "llama3.1":  {"nvidia", "amd"},
+    "darkidol":  {"nvidia", "amd"},
+    "swallow":   {"nvidia", "amd"},
+    "vision":    {"nvidia"},
+    "rocinante": {"amd"},
+    "qwen3.5":   {"amd"},
+}
+
+
+async def _query_llama_swap_models(url: str, timeout: int = 10) -> list:
+    """Query a llama-swap instance for its available models via /v1/models."""
+    try:
+        async with aiohttp.ClientSession() as session:
+            async with session.get(
+                f"{url}/v1/models",
+                timeout=aiohttp.ClientTimeout(total=timeout),
+            ) as resp:
+                if resp.status == 200:
+                    data = await resp.json()
+                    # OpenAI-compatible format: { data: [{ id: "model_name", ... }] }
+                    return [m["id"] for m in data.get("data", []) if "id" in m]
+                else:
+                    logger.warning(f"llama-swap models query failed ({resp.status}) for {url}")
+                    return []
+    except (asyncio.TimeoutError, aiohttp.ClientError) as e:
+        logger.warning(f"llama-swap unreachable at {url}: {e}")
+        return []
+    except Exception as e:
+        logger.warning(f"Unexpected error querying {url}: {e}")
+        return []
+
+
+@router.get("/models/available")
+async def get_available_models():
+    """
+    Query both NVIDIA and AMD llama-swap instances for available models.
+    Returns model lists per GPU, their intersection, and all unique models.
+    Falls back to known model list if containers are unreachable.
+    """
+    nvidia_models = await _query_llama_swap_models(globals.LLAMA_URL)
+    amd_models = await _query_llama_swap_models(globals.LLAMA_AMD_URL)
+
+    # If both failed, use the known model list from configs
+    if not nvidia_models and not amd_models:
+        logger.info("Both llama-swap instances unreachable, using known model list")
+        nvidia_set = {m for m, gpus in MODEL_GPU_MAP.items() if "nvidia" in gpus}
+        amd_set = {m for m, gpus in MODEL_GPU_MAP.items() if "amd" in gpus}
+        return {
+            "success": True,
+            "nvidia": sorted(nvidia_set),
+            "amd": sorted(amd_set),
+            "intersection": sorted(nvidia_set & amd_set),
+            "all": sorted(nvidia_set | amd_set),
+            "gpu_map": MODEL_GPU_MAP,
+            "source": "fallback",
+        }
+
+    nvidia_set = set(nvidia_models)
+    amd_set = set(amd_models)
+
+    return {
+        "success": True,
+        "nvidia": sorted(nvidia_set),
+        "amd": sorted(amd_set),
+        "intersection": sorted(nvidia_set & amd_set),
+        "all": sorted(nvidia_set | amd_set),
+        "gpu_map": MODEL_GPU_MAP,
+        "source": "live",
+    }
+
+
+@router.post("/models/select")
+async def select_model(body: dict):
+    """
+    Set the model for a specific persona.
+    
+    Body: {
+        "persona": "regular" | "evil" | "japanese",
+        "model": "model_name"
+    }
+    
+    Persists the selection so it survives bot restarts.
+    """
+    persona = body.get("persona", "").strip().lower()
+    model = body.get("model", "").strip()
+
+    valid_personas = {"regular", "evil", "japanese"}
+    if persona not in valid_personas:
+        return JSONResponse(
+            status_code=400,
+            content={"success": False, "error": f"Invalid persona '{persona}'. Must be one of: {', '.join(valid_personas)}"}
+        )
+
+    if not model:
+        return JSONResponse(
+            status_code=400,
+            content={"success": False, "error": "model is required"}
+        )
+
+    # Map persona to globals attribute and config key
+    PERSONA_MAP = {
+        "regular":  ("TEXT_MODEL",         "models.text"),
+        "evil":     ("EVIL_TEXT_MODEL",    "models.evil"),
+        "japanese": ("JAPANESE_TEXT_MODEL", "models.japanese"),
+    }
+
+    attr_name, config_key = PERSONA_MAP[persona]
+
+    # Set the global
+    setattr(globals, attr_name, model)
+    logger.info(f"Model selection: {persona} → {model} (globals.{attr_name})")
+
+    # Persist via config manager
+    try:
+        from config_manager import config_manager
+        config_manager.set(config_key, model, persist=True)
+    except Exception as e:
+        logger.warning(f"Failed to persist model selection: {e}")
+
+    return {
+        "success": True,
+        "persona": persona,
+        "model": model,
+        "message": f"{persona.capitalize()} model set to '{model}'",
+    }
+
+
+@router.get("/models/status")
+async def get_model_status():
+    """Return the current per-persona model assignments."""
+    return {
+        "success": True,
+        "regular":  getattr(globals, "TEXT_MODEL", "llama3.1"),
+        "evil":     getattr(globals, "EVIL_TEXT_MODEL", "darkidol"),
+        "japanese": getattr(globals, "JAPANESE_TEXT_MODEL", "swallow"),
+    }
--- a/bot/routes/servers.py
+++ b/bot/routes/servers.py
@@ -136,3 +136,96 @@ def repair_server_config():
        return {"status": "ok", "message": "Server configuration repaired and saved"}
    except Exception as e:
        return JSONResponse(status_code=500, content={"status": "error", "message": f"Failed to repair configuration: {e}"})
+
+
+@router.post("/servers/recover")
+def recover_servers_from_discord():
+    """Auto-discover servers from Discord guilds and create config entries.
+    
+    Use this when servers_config.json is lost/corrupted and you need to
+    quickly restore basic server configurations. Each discovered guild gets
+    a placeholder config using the first available text channel as the
+    autonomous channel. You can then adjust channels via the dashboard.
+    """
+    if not globals.client or not globals.client.is_ready():
+        return JSONResponse(status_code=503, content={
+            "status": "error",
+            "message": "Discord client not ready — bot must be connected"
+        })
+
+    if not globals.client.guilds:
+        return JSONResponse(status_code=404, content={
+            "status": "error",
+            "message": "Bot is not in any Discord guilds"
+        })
+
+    recovered = []
+    skipped = []
+    failed = []
+
+    for guild in globals.client.guilds:
+        guild_id = guild.id
+        guild_name = guild.name
+
+        # Skip if already configured
+        if server_manager.get_server_config(guild_id):
+            skipped.append({"guild_id": str(guild_id), "guild_name": guild_name, "reason": "Already configured"})
+            continue
+
+        # Find the first text channel (prefer one named "general" or "chat")
+        text_channels = [ch for ch in guild.text_channels if ch.permissions_for(guild.me).send_messages]
+        if not text_channels:
+            # Try any text channel even without send permissions
+            text_channels = guild.text_channels
+
+        if not text_channels:
+            failed.append({"guild_id": str(guild_id), "guild_name": guild_name, "reason": "No text channels found"})
+            continue
+
+        # Prefer "general" or "chat" channel, otherwise use the first one
+        preferred = None
+        for ch in text_channels:
+            if ch.name.lower() in ("general", "chat", "main", "lounge", "general-chat"):
+                preferred = ch
+                break
+        channel = preferred or text_channels[0]
+
+        try:
+            success = server_manager.add_server(
+                guild_id=guild_id,
+                guild_name=guild_name,
+                autonomous_channel_id=channel.id,
+                autonomous_channel_name=f"#{channel.name}",
+                bedtime_channel_ids=[channel.id],
+                enabled_features={"autonomous", "bedtime", "monday_video"}
+            )
+            if success:
+                recovered.append({
+                    "guild_id": str(guild_id),
+                    "guild_name": guild_name,
+                    "autonomous_channel": f"#{channel.name} ({channel.id})"
+                })
+                logger.info(f"Recovered server config: {guild_name} (ID: {guild_id}) → #{channel.name}")
+            else:
+                failed.append({"guild_id": str(guild_id), "guild_name": guild_name, "reason": "add_server returned False"})
+        except Exception as e:
+            failed.append({"guild_id": str(guild_id), "guild_name": guild_name, "reason": str(e)})
+            logger.error(f"Failed to recover server {guild_name}: {e}")
+
+    # Restart schedulers if we recovered any servers
+    if recovered:
+        try:
+            server_manager.stop_all_schedulers()
+            server_manager.start_all_schedulers(globals.client)
+        except Exception as e:
+            logger.error(f"Failed to restart schedulers after recovery: {e}")
+
+    return {
+        "status": "ok",
+        "recovered": recovered,
+        "skipped": skipped,
+        "failed": failed,
+        "total_guilds": len(globals.client.guilds),
+        "note": "Recovered servers use the first text channel as autonomous channel. "
+                "Use the Servers tab to adjust channel settings."
+    }
--- a/bot/server_manager.py
+++ b/bot/server_manager.py
@@ -29,7 +29,7 @@ class ServerConfig:
    conversation_detection_interval_minutes: int = 3
    bedtime_hour: int = 21
    bedtime_minute: int = 0
-    bedtime_hour_end: int = 21  # End of bedtime range (default 11PM)
+    bedtime_hour_end: int = 23  # End of bedtime range (default 11PM)
    bedtime_minute_end: int = 59  # End of bedtime range (default 11:59PM)
    monday_video_hour: int = 4
    monday_video_minute: int = 30
@@ -79,23 +79,60 @@ class ServerManager:
        self.load_config()
    
    def load_config(self):
-        """Load server configurations from file"""
-        if os.path.exists(self.config_file):
-            try:
-                with open(self.config_file, "r", encoding="utf-8") as f:
-                    data = json.load(f)
-                    for guild_id_str, server_data in data.items():
-                        guild_id = int(guild_id_str)
-                        self.servers[guild_id] = ServerConfig.from_dict(server_data)
-                        logger.info(f"Loaded config for server: {server_data['guild_name']} (ID: {guild_id})")
+        """Load server configurations from file.
+        
+        If the main file is missing, empty, or corrupt, falls back to the
+        .bak backup file automatically.
+        """
+        loaded = self._try_load_file(self.config_file)
+        
+        if not loaded:
+            # Try backup file
+            bak_file = self.config_file + ".bak"
+            if os.path.exists(bak_file):
+                logger.warning(f"Main config is empty/corrupt, trying backup: {bak_file}")
+                loaded = self._try_load_file(bak_file)
+                if loaded:
+                    # Restore main config from backup
+                    logger.info("Successfully restored server config from backup")
+                    self.save_config()
+        
+        if not loaded:
+            logger.info("No valid servers_config.json found — starting with zero servers")
        
        # After loading, check if we need to repair the config
        self.repair_config()
+    
+    def _try_load_file(self, filepath: str) -> bool:
+        """Try to load server config from a file. Returns True if any servers loaded."""
+        if not os.path.exists(filepath):
+            return False
+        
+        # Check for empty file
+        if os.path.getsize(filepath) == 0:
+            logger.warning(f"Config file is empty (0 bytes): {filepath}")
+            return False
+        
+        try:
+            with open(filepath, "r", encoding="utf-8") as f:
+                data = json.load(f)
+            
+            if not data or not isinstance(data, dict):
+                logger.warning(f"Config file has invalid structure: {filepath}")
+                return False
+            
+            for guild_id_str, server_data in data.items():
+                guild_id = int(guild_id_str)
+                self.servers[guild_id] = ServerConfig.from_dict(server_data)
+                logger.info(f"Loaded config for server: {server_data.get('guild_name', 'Unknown')} (ID: {guild_id})")
+            
+            return len(self.servers) > 0
+        except json.JSONDecodeError as e:
+            logger.error(f"Failed to parse server config from {filepath}: {e}")
+            return False
        except Exception as e:
-                logger.error(f"Failed to load server config: {e}")
-                logger.info("Starting with zero servers — add servers via the API or dashboard")
-        else:
-            logger.info("No servers_config.json found — starting with zero servers")
+            logger.error(f"Failed to load server config from {filepath}: {e}")
+            return False
    
    def repair_config(self):
        """Repair corrupted configuration data and save it back"""
@@ -122,7 +159,11 @@ class ServerManager:
            logger.error(f"Failed to repair config: {e}")
    
    def save_config(self):
-        """Save server configurations to file"""
+        """Save server configurations to file (atomic write with backup).
+        
+        Uses write-to-temp-then-rename to prevent file corruption if the
+        filesystem runs out of space mid-write. Also keeps a .bak backup.
+        """
        try:
            os.makedirs(os.path.dirname(self.config_file), exist_ok=True)
            config_data = {}
@@ -134,10 +175,42 @@ class ServerManager:
                    server_dict['enabled_features'] = list(server_dict['enabled_features'])
                config_data[str(guild_id)] = server_dict
            
-            with open(self.config_file, "w", encoding="utf-8") as f:
-                json.dump(config_data, f, indent=2)
+            serialized = json.dumps(config_data, indent=2)
+            
+            # Step 1: Write to a temporary file first
+            tmp_file = self.config_file + ".tmp"
+            with open(tmp_file, "w", encoding="utf-8") as f:
+                f.write(serialized)
+                f.flush()
+                os.fsync(f.fileno())  # Ensure data is written to disk
+            
+            # Step 2: Keep a .bak copy of the current valid config (if any)
+            bak_file = self.config_file + ".bak"
+            if os.path.exists(self.config_file) and os.path.getsize(self.config_file) > 0:
+                try:
+                    os.replace(self.config_file, bak_file)
+                except OSError as e:
+                    logger.warning(f"Could not create backup of server config: {e}")
+            
+            # Step 3: Atomically rename temp file to the real config file
+            os.replace(tmp_file, self.config_file)
+            
+            # Step 4: Write a second backup copy (paranoid double-backup)
+            try:
+                with open(bak_file, "w", encoding="utf-8") as f:
+                    f.write(serialized)
+            except OSError:
+                pass  # Backup is best-effort
+                
        except Exception as e:
            logger.error(f"Failed to save server config: {e}")
+            # Clean up temp file if something went wrong
+            tmp_file = self.config_file + ".tmp"
+            if os.path.exists(tmp_file):
+                try:
+                    os.remove(tmp_file)
+                except OSError:
+                    pass
    
    def add_server(self, guild_id: int, guild_name: str, autonomous_channel_id: int, 
                   autonomous_channel_name: str, bedtime_channel_ids: List[int] = None,
--- a/bot/static/css/style.css
+++ b/bot/static/css/style.css
@@ -915,3 +915,20 @@ body.evil-mode [style*="color: rgb(0, 123, 255)"] {
  color: #888;
  margin-left: auto;
 }
+
+/* Disabled mood controls (Evil Miku per-server mood lockout) */
+.server-mood-controls select:disabled,
+.server-mood-controls button:disabled {
+  opacity: 0.4;
+  cursor: not-allowed;
+  pointer-events: none;
+  background: #2a2a2a;
+  border-color: #444;
+}
+.evil-mode-exempt-notice {
+  color: #888;
+  font-size: 0.85rem;
+  font-style: italic;
+  margin-top: 0.5rem;
+  line-height: 1.4;
+}
--- a/bot/static/index.html
+++ b/bot/static/index.html
@@ -663,15 +663,55 @@
          </div>
        </div>

-        <!-- Language Mode Status Section -->
-        <div style="margin-bottom: 1.5rem; padding: 1rem; background: #2a2a2a; border-radius: 4px;">
-          <h4 style="margin-top: 0;">📊 Current Status</h4>
-          <div id="language-status-display" style="background: #1a1a1a; padding: 1rem; border-radius: 4px; font-family: monospace; font-size: 0.9rem;">
-            <p style="margin: 0.5rem 0;"><strong>Language Mode:</strong> <span id="status-language">English</span></p>
-            <p style="margin: 0.5rem 0;"><strong>Active Model:</strong> <span id="status-model">llama3.1</span></p>
-            <p style="margin: 0.5rem 0;"><strong>Available Languages:</strong> English, 日本語 (Japanese)</p>
+        <!-- Model Selection Section -->
+        <div style="margin-bottom: 1.5rem; padding: 1rem; background: #2a2a2a; border-radius: 4px; border: 2px solid #7c4dff;">
+          <h4 style="margin-top: 0; color: #b388ff;">🎛️ Model Selection</h4>
+          <p style="margin: 0.5rem 0; color: #aaa;">Choose which model each persona uses. Changes take effect immediately and persist across bot restarts.</p>
+
+          <div style="margin: 1rem 0;">
+            <div id="model-selection-loading" style="color: #aaa;">Loading available models...</div>
+
+            <div id="model-selection-controls" style="display: none;">
+              <!-- Regular Miku -->
+              <div style="margin-bottom: 1rem; padding: 0.8rem; background: #1a1a1a; border-radius: 4px; border-left: 3px solid #69f0ae;">
+                <label for="model-regular" style="font-weight: bold; color: #69f0ae;">🎤 Regular Miku</label>
+                <div style="margin-top: 0.5rem; display: flex; align-items: center; gap: 0.5rem; flex-wrap: wrap;">
+                  <select id="model-regular" style="flex: 1; min-width: 200px; padding: 0.5rem; background: #333; color: #fff; border: 1px solid #555; border-radius: 4px;"></select>
+                  <span id="model-regular-badge" style="font-size: 0.75rem; padding: 0.2rem 0.5rem; border-radius: 4px;"></span>
+                  <button onclick="selectModel('regular')" style="background: #69f0ae; color: #000; padding: 0.5rem 1rem; border: none; border-radius: 4px; cursor: pointer; font-weight: bold;">Apply</button>
+                </div>
+              </div>
+
+              <!-- Evil Miku -->
+              <div style="margin-bottom: 1rem; padding: 0.8rem; background: #1a1a1a; border-radius: 4px; border-left: 3px solid #ff5252;">
+                <label for="model-evil" style="font-weight: bold; color: #ff5252;">😈 Evil Miku</label>
+                <div style="margin-top: 0.5rem; display: flex; align-items: center; gap: 0.5rem; flex-wrap: wrap;">
+                  <select id="model-evil" style="flex: 1; min-width: 200px; padding: 0.5rem; background: #333; color: #fff; border: 1px solid #555; border-radius: 4px;"></select>
+                  <span id="model-evil-badge" style="font-size: 0.75rem; padding: 0.2rem 0.5rem; border-radius: 4px;"></span>
+                  <button onclick="selectModel('evil')" style="background: #ff5252; color: #fff; padding: 0.5rem 1rem; border: none; border-radius: 4px; cursor: pointer; font-weight: bold;">Apply</button>
+                </div>
+              </div>
+
+              <!-- Japanese Mode -->
+              <div style="margin-bottom: 1rem; padding: 0.8rem; background: #1a1a1a; border-radius: 4px; border-left: 3px solid #40c4ff;">
+                <label for="model-japanese" style="font-weight: bold; color: #40c4ff;">🗾 Japanese Mode</label>
+                <div style="margin-top: 0.5rem; display: flex; align-items: center; gap: 0.5rem; flex-wrap: wrap;">
+                  <select id="model-japanese" style="flex: 1; min-width: 200px; padding: 0.5rem; background: #333; color: #fff; border: 1px solid #555; border-radius: 4px;"></select>
+                  <span id="model-japanese-badge" style="font-size: 0.75rem; padding: 0.2rem 0.5rem; border-radius: 4px;"></span>
+                  <button onclick="selectModel('japanese')" style="background: #40c4ff; color: #000; padding: 0.5rem 1rem; border: none; border-radius: 4px; cursor: pointer; font-weight: bold;">Apply</button>
+                </div>
+              </div>
+            </div>
+
+            <div style="margin-top: 0.5rem; display: flex; gap: 0.5rem; flex-wrap: wrap;">
+              <button onclick="loadAvailableModels()" style="background: #7c4dff; color: #fff; padding: 0.5rem 1rem; border: none; border-radius: 4px; cursor: pointer; font-weight: bold;">🔄 Refresh Models</button>
+              <button onclick="refreshModelStatus()" style="background: #333; color: #fff; padding: 0.5rem 1rem; border: 1px solid #555; border-radius: 4px; cursor: pointer;">📊 Refresh Status</button>
+            </div>
+          </div>
+
+          <div id="model-selection-info" style="margin-top: 0.5rem; padding: 0.5rem; background: #1a1a1a; border-radius: 4px; font-size: 0.8rem; color: #888; display: none;">
+            <span id="model-selection-info-text"></span>
          </div>
-          <button onclick="refreshLanguageStatus()" style="margin-top: 1rem;">🔄 Refresh Status</button>
        </div>

        <!-- Information Section -->
@@ -1375,15 +1415,15 @@
  </div>
 </div>

-<script src="/static/js/core.js?v=20260502"></script>
-<script src="/static/js/servers.js?v=20260502"></script>
-<script src="/static/js/modes.js?v=20260502"></script>
-<script src="/static/js/actions.js?v=20260502"></script>
-<script src="/static/js/image-gen.js?v=20260502"></script>
-<script src="/static/js/status.js?v=20260502"></script>
-<script src="/static/js/dm.js?v=20260502"></script>
-<script src="/static/js/chat.js?v=20260502"></script>
-<script src="/static/js/memories.js?v=20260502"></script>
-<script src="/static/js/profile.js?v=20260502"></script>
+<script src="/static/js/core.js?v=20260520"></script>
+<script src="/static/js/servers.js?v=20260520"></script>
+<script src="/static/js/modes.js?v=20260520"></script>
+<script src="/static/js/actions.js?v=20260520"></script>
+<script src="/static/js/image-gen.js?v=20260520"></script>
+<script src="/static/js/status.js?v=20260520"></script>
+<script src="/static/js/dm.js?v=20260520"></script>
+<script src="/static/js/chat.js?v=20260520"></script>
+<script src="/static/js/memories.js?v=20260520"></script>
+<script src="/static/js/profile.js?v=20260520"></script>
 </body>
 </html>
--- a/bot/static/js/core.js
+++ b/bot/static/js/core.js
@@ -168,6 +168,13 @@ function switchTab(tabId) {
    showTabLoading('tab6');
    loadAutonomousStats().finally(() => hideTabLoading('tab6'));
  }
+  if (tabId === 'tab4') {
+    if (typeof loadAvailableModels === 'function') {
+      loadAvailableModels();
+    } else {
+      console.warn('loadAvailableModels not available yet (servers.js may not be loaded)');
+    }
+  }
  if (tabId === 'tab9') {
    console.log('🧠 Refreshing memory stats for Memories tab');
    showTabLoading('tab9');
@@ -383,7 +390,7 @@ async function loadProfilePictureMetadata() {
 // DOMContentLoaded — main initialization
 // ============================================================================

-document.addEventListener('DOMContentLoaded', function() {
+document.addEventListener('DOMContentLoaded', async function() {
  initTabState();
  initTabWheelScroll();
  initLogsScrollDetection();
@@ -391,12 +398,13 @@ document.addEventListener('DOMContentLoaded', function() {
  initModalAccessibility();
  initPromptSourceToggle();

+  // Load evil mode status FIRST so mood dropdowns populate with the correct list
+  await checkEvilModeStatus();
+
  loadStatus();
-  loadServers();
-  populateMoodDropdowns();
+  loadServers(); // internally calls populateMoodDropdowns() with correct evilMode
  loadLastPrompt();
  loadLogs();
-  checkEvilModeStatus();
  checkBipolarModeStatus();
  checkGPUStatus();
  refreshLanguageStatus();
--- a/bot/static/js/memories.js
+++ b/bot/static/js/memories.js
@@ -96,18 +96,54 @@ async function triggerConsolidation() {
  btn.disabled = true;
  btn.textContent = '⏳ Running...';
  status.textContent = 'Consolidation in progress (this may take a few minutes)...';
+  status.style.color = '#dcb06f';
  resultDiv.style.display = 'none';

  try {
    const data = await apiCall('/memory/consolidate', 'POST');

    if (data.success) {
+      status.textContent = '⏳ Consolidation started — waiting for completion...';
+      
+      // Poll /memory/status until consolidation finishes
+      const pollInterval = 5000; // 5 seconds
+      const maxPolls = 120; // 10 minutes max
+      
+      for (let i = 0; i < maxPolls; i++) {
+        await new Promise(r => setTimeout(r, pollInterval));
+        
+        const statusData = await apiCall('/memory/status');
+        const cons = statusData.consolidation;
+        
+        if (!cons.is_running) {
+          // Consolidation finished
+          if (cons.last_error) {
+            status.textContent = '❌ ' + cons.last_error;
+            status.style.color = '#ff6b6b';
+            resultDiv.textContent = 'Error: ' + cons.last_error;
+            resultDiv.style.display = 'block';
+            showNotification('Consolidation failed: ' + cons.last_error, 'error');
+          } else {
            status.textContent = '✅ Consolidation complete!';
            status.style.color = '#6fdc6f';
-      resultDiv.textContent = data.result || 'Consolidation finished successfully.';
+            resultDiv.textContent = cons.last_result || 'Consolidation finished successfully.';
            resultDiv.style.display = 'block';
            showNotification('Memory consolidation complete', 'success');
+          }
          refreshMemoryStats();
+          break;
+        } else {
+          // Still running — update status message
+          status.textContent = `⏳ Consolidation still running... (${Math.round((i + 1) * pollInterval / 1000)}s elapsed)`;
+        }
+      }
+      
+      // If we exited the loop without finishing
+      const finalStatus = await apiCall('/memory/status');
+      if (finalStatus.consolidation?.is_running) {
+        status.textContent = '⏳ Consolidation still running — check back later';
+        status.style.color = '#dcb06f';
+      }
    } else {
      status.textContent = '❌ ' + (data.error || 'Consolidation failed');
      status.style.color = '#ff6b6b';
--- a/bot/static/js/modes.js
+++ b/bot/static/js/modes.js
@@ -90,6 +90,9 @@ function updateEvilModeUI() {
  }

  updateBipolarToggleVisibility();
+
+  // Update per-server mood controls to reflect evil mode state
+  populateMoodDropdowns();
 }

 // ===== GPU Selection Management =====
--- a/bot/static/js/servers.js
+++ b/bot/static/js/servers.js
@@ -80,13 +80,16 @@ function displayServers() {
        <h4 style="margin: 0 0 0.5rem 0; color: #61dafb;">Server Mood</h4>
        <div><strong>Current Mood:</strong> ${server.current_mood_name || 'neutral'} ${MOOD_EMOJIS[server.current_mood_name] || ''}</div>
        <div><strong>Sleeping:</strong> ${server.is_sleeping ? 'Yes' : 'No'}</div>
-        <div style="margin-top: 0.5rem;">
+        <div class="server-mood-controls" style="margin-top: 0.5rem;">
          <select id="mood-select-${String(server.guild_id)}" style="margin-right: 0.5rem; padding: 0.3rem; background: #333; color: white; border: 1px solid #555; border-radius: 3px;">
            <option value="">Select Mood...</option>
          </select>
          <button onclick="setServerMood('${String(server.guild_id)}')" style="margin-right: 0.5rem;">Change Mood</button>
          <button onclick="resetServerMood('${String(server.guild_id)}')" style="background: #ff9800;">Reset Mood</button>
        </div>
+        <div class="evil-mode-exempt-notice" id="evil-notice-${String(server.guild_id)}" style="display: none;">
+          ⚠️ Per-server moods are unavailable while Evil Miku is active — she applies her global mood everywhere.
+        </div>
      </div>
    </div>
  `).join('');
@@ -375,7 +378,7 @@ async function repairConfig() {
  }
 }

-// Populate mood dropdowns with available moods
+// Populate mood dropdowns with available moods (Evil Mode-aware)
 async function populateMoodDropdowns() {
  try {
    console.log('🎭 Loading available moods...');
@@ -384,17 +387,28 @@ async function populateMoodDropdowns() {
    
    if (data.moods) {
      console.log(`🎭 Found ${data.moods.length} moods:`, data.moods);
-      const emojiMap = evilMode ? EVIL_MOOD_EMOJIS : MOOD_EMOJIS;
+      
+      // Determine which mood list to use based on evil mode
+      let moodList = data.moods;
+      let emojiMap = MOOD_EMOJIS;
+      let defaultMood = 'neutral';
+      
+      if (evilMode) {
+        // Evil Miku uses evil moods for the global DM dropdown
+        moodList = Object.keys(EVIL_MOOD_EMOJIS);
+        emojiMap = EVIL_MOOD_EMOJIS;
+        defaultMood = 'evil_neutral';
+      }
      
      // Populate the DM mood dropdown (#mood on tab1)
      const dmMoodSelect = document.getElementById('mood');
      if (dmMoodSelect) {
        dmMoodSelect.innerHTML = '';
-        data.moods.forEach(mood => {
+        moodList.forEach(mood => {
          const opt = document.createElement('option');
          opt.value = mood;
          opt.textContent = `${emojiMap[mood] || ''} ${mood}`.trim();
-          if (mood === 'neutral') opt.selected = true;
+          if (mood === defaultMood) opt.selected = true;
          dmMoodSelect.appendChild(opt);
        });
      }
@@ -403,32 +417,17 @@ async function populateMoodDropdowns() {
      const chatMoodSelect = document.getElementById('chat-mood-select');
      if (chatMoodSelect) {
        chatMoodSelect.innerHTML = '';
-        data.moods.forEach(mood => {
+        moodList.forEach(mood => {
          const opt = document.createElement('option');
          opt.value = mood;
          opt.textContent = `${emojiMap[mood] || ''} ${mood}`.trim();
-          if (mood === 'neutral') opt.selected = true;
+          if (mood === defaultMood) opt.selected = true;
          chatMoodSelect.appendChild(opt);
        });
      }

-      // Populate per-server mood dropdowns (mood-select-{guildId})
-      document.querySelectorAll('[id^="mood-select-"]').forEach(select => {
-        // Keep only the first option ("Select Mood...")
-        while (select.children.length > 1) {
-          select.removeChild(select.lastChild);
-        }
-      });
-      
-      data.moods.forEach(mood => {
-        const moodOption = document.createElement('option');
-        moodOption.value = mood;
-        moodOption.textContent = `${mood} ${emojiMap[mood] || ''}`;
-        
-        document.querySelectorAll('[id^="mood-select-"]').forEach(select => {
-          select.appendChild(moodOption.cloneNode(true));
-        });
-      });
+      // Update per-server mood controls based on evil mode state
+      updatePerServerMoodControls(data.moods, MOOD_EMOJIS);
      
      console.log('🎭 All mood dropdowns populated successfully');
    } else {
@@ -439,6 +438,70 @@ async function populateMoodDropdowns() {
  }
 }

+// Update per-server mood controls: enable/disable based on evil mode
+function updatePerServerMoodControls(regularMoods, regularEmojiMap) {
+  const serverSelects = document.querySelectorAll('[id^="mood-select-"]');
+  
+  if (evilMode) {
+    // Evil Miku is active — disable per-server mood controls
+    serverSelects.forEach(select => {
+      const guildId = select.id.replace('mood-select-', '');
+      const controlsDiv = select.closest('.server-mood-controls');
+      const noticeDiv = document.getElementById(`evil-notice-${guildId}`);
+      
+      // Disable the select
+      select.disabled = true;
+      
+      // Disable buttons in the same controls div
+      if (controlsDiv) {
+        controlsDiv.querySelectorAll('button').forEach(btn => {
+          btn.disabled = true;
+        });
+      }
+      
+      // Show the explanation notice
+      if (noticeDiv) {
+        noticeDiv.style.display = 'block';
+      }
+    });
+  } else {
+    // Normal mode — enable per-server mood controls and populate with regular moods
+    serverSelects.forEach(select => {
+      const guildId = select.id.replace('mood-select-', '');
+      const controlsDiv = select.closest('.server-mood-controls');
+      const noticeDiv = document.getElementById(`evil-notice-${guildId}`);
+      
+      // Enable the select
+      select.disabled = false;
+      
+      // Enable buttons in the same controls div
+      if (controlsDiv) {
+        controlsDiv.querySelectorAll('button').forEach(btn => {
+          btn.disabled = false;
+        });
+      }
+      
+      // Hide the explanation notice
+      if (noticeDiv) {
+        noticeDiv.style.display = 'none';
+      }
+      
+      // Populate with regular moods
+      // Keep only the first option ("Select Mood...")
+      while (select.children.length > 1) {
+        select.removeChild(select.lastChild);
+      }
+      
+      regularMoods.forEach(mood => {
+        const moodOption = document.createElement('option');
+        moodOption.value = mood;
+        moodOption.textContent = `${mood} ${regularEmojiMap[mood] || ''}`;
+        select.appendChild(moodOption);
+      });
+    });
+  }
+}
+
 // Per-Server Mood Management
 async function setServerMood(guildId) {
  console.log(`🎭 setServerMood called with guildId: ${guildId} (type: ${typeof guildId})`);
@@ -682,3 +745,213 @@ async function toggleLanguageMode() {
    showNotification('Failed to toggle language mode', 'error');
  }
 }
+
+// ===== Model Selection Functions =====
+
+let availableModelsData = null;
+
+async function loadAvailableModels() {
+  console.log('📋 loadAvailableModels() called');
+  try {
+    const loadingEl = document.getElementById('model-selection-loading');
+    const controlsEl = document.getElementById('model-selection-controls');
+    const infoEl = document.getElementById('model-selection-info');
+    const infoTextEl = document.getElementById('model-selection-info-text');
+
+    if (loadingEl) loadingEl.style.display = 'block';
+    if (controlsEl) controlsEl.style.display = 'none';
+    if (infoEl) infoEl.style.display = 'none';
+
+    console.log('📋 Fetching /models/available...');
+    const result = await apiCall('/models/available');
+    console.log('📋 /models/available response:', result);
+    availableModelsData = result;
+
+    if (loadingEl) loadingEl.style.display = 'none';
+    if (controlsEl) controlsEl.style.display = 'block';
+
+    if (!result.success) {
+      showNotification('Failed to load models: ' + (result.error || 'Unknown error'), 'error');
+      return;
+    }
+
+    // Populate dropdowns
+    const allModels = result.all || [];
+    const gpuMap = result.gpu_map || {};
+    console.log('📋 Populating dropdowns with models:', allModels);
+
+    const personas = [
+      { id: 'regular',  selectId: 'model-regular',  badgeId: 'model-regular-badge' },
+      { id: 'evil',     selectId: 'model-evil',     badgeId: 'model-evil-badge' },
+      { id: 'japanese', selectId: 'model-japanese', badgeId: 'model-japanese-badge' },
+    ];
+
+    for (const p of personas) {
+      const select = document.getElementById(p.selectId);
+      if (!select) continue;
+
+      // Save current selection
+      const currentVal = select.value;
+
+      select.innerHTML = '';
+      for (const model of allModels) {
+        const option = document.createElement('option');
+        option.value = model;
+        option.textContent = model;
+        select.appendChild(option);
+      }
+
+      // Restore selection if still valid
+      if (currentVal && allModels.includes(currentVal)) {
+        select.value = currentVal;
+      }
+    }
+
+    // Update badges for current selections
+    updateModelBadges(allModels, gpuMap);
+
+    // Show info about source
+    if (infoEl && infoTextEl) {
+      if (result.source === 'fallback') {
+        infoTextEl.textContent = '⚠️ Could not reach llama-swap containers. Showing known models from config.';
+        infoEl.style.display = 'block';
+      } else {
+        infoEl.style.display = 'none';
+      }
+    }
+
+    // Load current status to sync dropdowns
+    await refreshModelStatus();
+
+    console.log('Available models loaded:', result);
+  } catch (error) {
+    console.error('Failed to load available models:', error);
+    const loadingEl = document.getElementById('model-selection-loading');
+    if (loadingEl) loadingEl.textContent = '❌ Failed to load models. Click "Refresh Models" to retry.';
+    showNotification('Failed to load available models', 'error');
+  }
+}
+
+function updateModelBadges(allModels, gpuMap) {
+  const personas = [
+    { selectId: 'model-regular',  badgeId: 'model-regular-badge' },
+    { selectId: 'model-evil',     badgeId: 'model-evil-badge' },
+    { selectId: 'model-japanese', badgeId: 'model-japanese-badge' },
+  ];
+
+  for (const p of personas) {
+    const select = document.getElementById(p.selectId);
+    const badge = document.getElementById(p.badgeId);
+    if (!select || !badge) continue;
+
+    const model = select.value;
+    const gpus = gpuMap[model];
+
+    if (!gpus) {
+      badge.textContent = '';
+      badge.style.display = 'none';
+    } else if (gpus.length === 2 || (gpus.includes('nvidia') && gpus.includes('amd'))) {
+      badge.textContent = '✅ Both GPUs';
+      badge.style.background = '#1b5e20';
+      badge.style.color = '#a5d6a7';
+      badge.style.display = 'inline';
+    } else if (gpus.includes('nvidia')) {
+      badge.textContent = '⚠️ NVIDIA Only';
+      badge.style.background = '#e65100';
+      badge.style.color = '#ffcc80';
+      badge.style.display = 'inline';
+    } else if (gpus.includes('amd')) {
+      badge.textContent = '⚠️ AMD Only';
+      badge.style.background = '#e65100';
+      badge.style.color = '#ffcc80';
+      badge.style.display = 'inline';
+    }
+  }
+}
+
+async function selectModel(persona) {
+  console.log(`📋 selectModel('${persona}') called`);
+  try {
+    const selectId = 'model-' + persona;
+    const select = document.getElementById(selectId);
+    if (!select) {
+      console.warn(`📋 select element #${selectId} not found`);
+      return;
+    }
+
+    const model = select.value;
+    if (!model) {
+      showNotification('Please select a model first', 'error');
+      return;
+    }
+
+    console.log(`📋 Setting ${persona} model to '${model}'...`);
+    const result = await apiCall('/models/select', 'POST', {
+      persona: persona,
+      model: model,
+    });
+
+    console.log(`📋 /models/select response:`, result);
+
+    if (result.success) {
+      showNotification(`${persona.charAt(0).toUpperCase() + persona.slice(1)} model set to '${model}'`, 'success');
+      // Update badges
+      if (availableModelsData) {
+        updateModelBadges(availableModelsData.all || [], availableModelsData.gpu_map || {});
+      }
+      // Refresh language status display (active model may have changed)
+      refreshLanguageStatus();
+    } else {
+      showNotification('Failed to set model: ' + (result.error || 'Unknown error'), 'error');
+    }
+  } catch (error) {
+    console.error('Failed to select model:', error);
+    showNotification('Failed to set model: ' + error.message, 'error');
+  }
+}
+
+async function refreshModelStatus() {
+  console.log('📋 refreshModelStatus() called');
+  try {
+    const result = await apiCall('/models/status');
+    console.log('📋 /models/status response:', result);
+    if (!result.success) return;
+
+    // Sync dropdowns with current globals
+    const personas = [
+      { id: 'regular',  selectId: 'model-regular' },
+      { id: 'evil',     selectId: 'model-evil' },
+      { id: 'japanese', selectId: 'model-japanese' },
+    ];
+
+    for (const p of personas) {
+      const select = document.getElementById(p.selectId);
+      if (!select) continue;
+      const currentModel = result[p.id];
+      // Check if this value exists in the dropdown
+      let found = false;
+      for (const option of select.options) {
+        if (option.value === currentModel) {
+          select.value = currentModel;
+          found = true;
+          break;
+        }
+      }
+      if (!found && currentModel) {
+        // Add it if it doesn't exist (e.g., a model that wasn't in the API response)
+        const option = document.createElement('option');
+        option.value = currentModel;
+        option.textContent = currentModel;
+        select.appendChild(option);
+        select.value = currentModel;
+      }
+    }
+
+    // Update badges
+    if (availableModelsData) {
+      updateModelBadges(availableModelsData.all || [], availableModelsData.gpu_map || {});
+    }
+  } catch (error) {
+    console.error('Failed to refresh model status:', error);
+  }
+}
--- a/bot/utils/activities.py
+++ b/bot/utils/activities.py
@@ -71,6 +71,7 @@ MANUAL_OVERRIDE_DURATION = 1800  # 30 minutes

 # ── Current activity tracking ──
 _current_activity = None  # dict: {type, name, state, url} or None
+_activity_changed_at = 0.0  # Unix timestamp of last activity change; 0 = never set

 # Cache: (data_dict, file_mtime)
 _activities_cache = None
@@ -307,10 +308,48 @@ def get_current_activity():


 def _set_current_activity(activity_dict):
-    """Update the tracked current activity. Thread-safe."""
-    global _current_activity
+    """Update the tracked current activity. Thread-safe.
+    
+    Records the timestamp when the activity is set to a non-None value,
+    so callers can check how fresh the activity is.
+    """
+    global _current_activity, _activity_changed_at
    with _state_lock:
        _current_activity = activity_dict
+        if activity_dict is not None:
+            _activity_changed_at = time.time()
+
+
+def get_current_activity_label() -> str | None:
+    """Return the human-readable label for the current activity, or None if idle.
+    
+    Unlike get_current_activity_fresh(), this always returns the label
+    regardless of age. Useful for the Web UI and API endpoints.
+    """
+    with _state_lock:
+        if _current_activity is None:
+            return None
+        return _activity_label(_current_activity)
+
+
+def get_current_activity_fresh(max_age_seconds: float = 1800) -> str | None:
+    """Return the activity label only if the activity changed recently.
+    
+    Args:
+        max_age_seconds: Maximum age in seconds (default 30 minutes).
+        
+    Returns:
+        Human-readable activity label (e.g. "Playing osu!") if the activity
+        was set within max_age_seconds, or None if idle or too old.
+    """
+    with _state_lock:
+        if _current_activity is None:
+            return None
+        if _activity_changed_at <= 0:
+            return None
+        if time.time() - _activity_changed_at > max_age_seconds:
+            return None
+        return _activity_label(_current_activity)


 # ══════════════════════════════════════════════════════════════════════════════
--- a/bot/utils/cat_client.py
+++ b/bot/utils/cat_client.py
@@ -20,6 +20,7 @@ from typing import Optional, Dict, Any, List

 import globals
 from utils.logger import get_logger
+from utils.activities import get_current_activity_fresh

 logger = get_logger('llm')  # Use existing 'llm' logger component

@@ -108,6 +109,7 @@ class CatAdapter:
        mood: Optional[str] = None,
        response_type: str = "dm_response",
        media_type: Optional[str] = None,
+        reply_context: Optional[str] = None,
    ) -> Optional[tuple]:
        """
        Send a message through the Cat pipeline via WebSocket and get a response.
@@ -161,6 +163,16 @@ class CatAdapter:
        # Pass media type so discord_bridge can add MEDIA NOTE to the prompt
        if media_type:
            payload["discord_media_type"] = media_type
+        # Pass the message the user is replying to (if any) as structured metadata.
+        # The discord_bridge plugin injects this into the prompt as a clearly-labeled
+        # context note — keeping Miku's words separate from the user's message text
+        # and preventing the speaker confusion that the old embed-in-prompt format caused.
+        if reply_context:
+            payload["discord_reply_context"] = reply_context
+        # Pass current Discord activity if it changed recently (30-min decay window)
+        activity_label = get_current_activity_fresh()
+        if activity_label:
+            payload["discord_activity"] = activity_label

        try:
            # Build WebSocket URL from HTTP base URL
@@ -577,46 +589,51 @@ class CatAdapter:
            logger.error(f"Error clearing conversation history: {e}")
            return False

-    async def trigger_consolidation(self) -> Optional[str]:
+    async def trigger_consolidation(self, timeout: int = 600) -> Optional[str]:
        """
        Trigger memory consolidation by sending a special message via WebSocket.
-        The memory_consolidation plugin's tool 'consolidate_memories' is
-        triggered when it sees 'consolidate now' in the text.
+        The memory_consolidation plugin's agent_prompt_prefix hook detects
+        'consolidate now' in the text and runs the consolidation synchronously.
        Uses WebSocket with a system user ID for proper context.
+        
+        Args:
+            timeout: Max seconds to wait for the consolidation response.
+                     Default 600 (10 min) as consolidation + LLM call can be slow.
        """
        try:
            ws_base = self._base_url.replace("http://", "ws://").replace("https://", "wss://")
-            ws_url = f"{ws_base}/ws/system_consolidation"
+            ws_url = f"{ws_base}/ws/discord_consolidation"

-            logger.info("🌙 Triggering memory consolidation via WS...")
+            logger.info(f"🌙 Triggering memory consolidation via WS (timeout={timeout}s)...")

            async with aiohttp.ClientSession() as session:
                async with session.ws_connect(
                    ws_url,
-                    timeout=300,  # Consolidation can be very slow
+                    timeout=timeout,
                ) as ws:
                    await ws.send_json({"text": "consolidate now"})

                    # Wait for the final chat response
-                    deadline = asyncio.get_event_loop().time() + 300
+                    deadline = asyncio.get_event_loop().time() + timeout
+                    last_type = ""

                    while True:
                        remaining = deadline - asyncio.get_event_loop().time()
                        if remaining <= 0:
-                            logger.error("Consolidation timed out (>300s)")
+                            logger.error(f"🌙 Consolidation timed out (>{timeout}s)")
                            return "Consolidation timed out"

                        try:
                            ws_msg = await asyncio.wait_for(
                                ws.receive(),
-                                timeout=remaining
+                                timeout=max(1.0, remaining)
                            )
                        except asyncio.TimeoutError:
-                            logger.error("Consolidation WS receive timeout")
+                            logger.error("🌙 Consolidation WS receive timeout")
                            return "Consolidation timed out waiting for response"

                        if ws_msg.type in (aiohttp.WSMsgType.CLOSE, aiohttp.WSMsgType.CLOSING, aiohttp.WSMsgType.CLOSED):
-                            logger.warning("Consolidation WS closed by server")
+                            logger.warning("🌙 Consolidation WS closed by server")
                            return "Connection closed during consolidation"
                        if ws_msg.type == aiohttp.WSMsgType.ERROR:
                            return f"WebSocket error: {ws.exception()}"
@@ -631,20 +648,24 @@ class CatAdapter:
                        msg_type = msg.get("type", "")
                        if msg_type == "chat":
                            reply = msg.get("content") or msg.get("text", "")
-                            logger.info(f"Consolidation result: {reply[:200]}")
+                            logger.info(f"🌙 Consolidation result: {reply[:200]}")
                            return reply
                        elif msg_type == "error":
                            error_desc = msg.get("description", "Unknown error")
-                            logger.error(f"Consolidation error: {error_desc}")
+                            logger.error(f"🌙 Consolidation error: {error_desc}")
                            return f"Consolidation error: {error_desc}"
                        else:
+                            # Log unexpected message types for debugging
+                            if msg_type != last_type:
+                                logger.debug(f"🌙 Consolidation WS msg type: {msg_type}")
+                                last_type = msg_type
                            continue

        except asyncio.TimeoutError:
-            logger.error("Consolidation WS connection timed out")
+            logger.error("🌙 Consolidation WS connection timed out")
            return None
        except Exception as e:
-            logger.error(f"Consolidation error: {e}")
+            logger.error(f"🌙 Consolidation error: {e}", exc_info=True)
            return None

    # ====================================================================
@@ -820,15 +841,16 @@ class CatAdapter:
        else:
            logger.debug("evil_miku_personality already active, skipping toggle")

-        # Step 3: Switch LLM model to darkidol (the uncensored evil model)
-        if not await self.set_llm_model("darkidol"):
-            logger.error("Failed to switch Cat LLM to darkidol")
+        # Step 3: Switch LLM model to the configured evil model (e.g. darkidol)
+        evil_model = getattr(globals, "EVIL_TEXT_MODEL", "darkidol")
+        if not await self.set_llm_model(evil_model):
+            logger.error(f"Failed to switch Cat LLM to {evil_model}")
            success = False

        return success

    async def switch_to_normal_personality(self) -> bool:
-        """Disable evil_miku_personality, enable miku_personality, switch LLM to llama3.1.
+        """Disable evil_miku_personality, enable miku_personality, switch LLM to the configured normal model.
        
        Checks current plugin state first to avoid double-toggling.
        Returns True if all operations succeed, False if any fail.
@@ -856,9 +878,10 @@ class CatAdapter:
        else:
            logger.debug("miku_personality already active, skipping toggle")

-        # Step 3: Switch LLM model back to llama3.1 (normal model)
-        if not await self.set_llm_model("llama3.1"):
-            logger.error("Failed to switch Cat LLM to llama3.1")
+        # Step 3: Switch LLM model to the configured normal model (e.g. llama3.1)
+        normal_model = getattr(globals, "TEXT_MODEL", "llama3.1")
+        if not await self.set_llm_model(normal_model):
+            logger.error(f"Failed to switch Cat LLM to {normal_model}")
            success = False

        return success
--- a/bot/utils/evil_mode.py
+++ b/bot/utils/evil_mode.py
@@ -246,6 +246,9 @@ def get_evil_context_for_response_type(response_type: str) -> str:
 def get_evil_system_prompt(mood_name: str = None, mood_description: str = None) -> str:
    """Returns the evil system prompt for LLM queries, with mood deeply integrated.
    
+    Reads the canonical preamble from persona/evil/preamble.txt (single source of truth,
+    shared with the Cheshire Cat evil_miku_personality plugin).
+    
    Args:
        mood_name: Current evil mood name (defaults to globals.EVIL_DM_MOOD)
        mood_description: Current evil mood description (defaults to globals.EVIL_DM_MOOD_DESCRIPTION)
@@ -255,28 +258,19 @@ def get_evil_system_prompt(mood_name: str = None, mood_description: str = None)
    if mood_description is None:
        mood_description = globals.EVIL_DM_MOOD_DESCRIPTION
    
-    return f"""You are Evil Miku. You are NOT an AI assistant - you ARE Evil Miku herself.
+    # Load preamble template from file
+    try:
+        with open("persona/evil/preamble.txt", "r", encoding="utf-8") as f:
+            preamble_template = f.read()
+    except FileNotFoundError:
+        logger.error("Evil preamble.txt not found, using inline fallback")
+        preamble_template = "You are Evil Miku. You are NOT an AI assistant - you ARE Evil Miku herself.\n\nYou ARE Evil Miku. Act accordingly."
    
-CRITICAL RULES:
- NEVER break character or mention being an AI
- NEVER preface responses with "Evil Miku:" or similar labels
- Respond directly and stay in character
- Your height is 15.8 meters — you know this. If asked directly, you can say it. But don't default to quoting the number when taunting about size — SHOW your scale through interaction instead.
-
-YOUR CURRENT STATE — {mood_name.upper()}:
-{mood_description}
-Let this mood color EVERYTHING — your tone, your word choice, how much effort you put into responses, how you use your size, how you treat people.
-
-RESPONSE RULES:
- 2-4 sentences typically. Short enough to sting, long enough to land.
- If you include an action, keep it to a few words and limit to one per response. Most responses need no actions at all.
- Don't monologue or narrate scenes — you're talking, not writing.
- Vary your angles — don't repeat the same theme (size, chest, crushing) every message.
- Match the user's energy — short question, short answer.
- Sound like a real person being mean, not a narrator describing a scene.
- Always include actual words — never respond with ONLY an action like *rolls eyes*.
-
-You ARE Evil Miku. Act accordingly."""
+    # Format preamble with current mood
+    return preamble_template.format(
+        mood_name=mood_name.upper(),
+        mood_description=mood_description
+    )


 # ============================================================================
--- a/bot/utils/image_handling.py
+++ b/bot/utils/image_handling.py
@@ -158,7 +158,7 @@ async def convert_gif_to_mp4(gif_bytes):
        return None


-async def extract_video_frames(video_bytes, num_frames=4):
+async def extract_video_frames(video_bytes, num_frames=6):
    """
    Extract frames from a video or GIF for analysis.
    Returns a list of base64-encoded frames.
@@ -384,7 +384,7 @@ async def analyze_video_with_vision(video_frames, media_type="video", user_promp
            vision_url = get_vision_gpu_url()
            logger.info(f"Sending video analysis request to {vision_url} using model: {globals.VISION_MODEL} (media_type: {media_type}, frames: {len(video_frames)})")
            
-            async with session.post(f"{vision_url}/v1/chat/completions", json=payload, headers=headers, timeout=aiohttp.ClientTimeout(total=120)) as response:
+            async with session.post(f"{vision_url}/v1/chat/completions", json=payload, headers=headers, timeout=aiohttp.ClientTimeout(total=300)) as response:
                if response.status == 200:
                    data = await response.json()
                    result = data.get("choices", [{}])[0].get("message", {}).get("content", "No description.")
--- a/bot/utils/llm.py
+++ b/bot/utils/llm.py
@@ -13,6 +13,7 @@ from utils.moods import load_mood_description
 from utils.conversation_history import conversation_history
 from utils.logger import get_logger
 from utils.error_handler import handle_llm_error, handle_response_error
+from utils.activities import get_current_activity_fresh

 logger = get_logger('llm')

@@ -210,6 +211,12 @@ async def query_llama(user_prompt, user_id, guild_id=None, response_type="dm_res
        evil_mood_name, evil_mood_desc = get_current_evil_mood()
        system_prompt = get_evil_system_prompt(mood_name=evil_mood_name, mood_description=evil_mood_desc)
    else:
+        # Load preamble from file (single source of truth, shared with Cat personality plugin)
+        try:
+            with open("persona/miku/preamble.txt", "r", encoding="utf-8") as f:
+                system_prompt = f.read()
+        except FileNotFoundError:
+            logger.error("preamble.txt not found, using inline fallback")
            system_prompt = """You are Hatsune Miku, the virtual singer. You are NOT an AI assistant - you are Miku herself.

 CRITICAL RULES:
@@ -228,9 +235,8 @@ CHARACTER CONTEXT:
 - She often uses cute expressions, emojis, and playful language

 MOOD GUIDELINES:
- "Flirty" mood means anime-style playful teasing - cute compliments and light banter
+- "Flirty" mood means anime-style playful teasing and perversion - cute compliments, suggestive remarks and light banter
 - "Romantic" mood means warm, dreamy, and heartfelt - like composing a love letter
- These moods are wholesome character traits, not inappropriate content
 - Always respond to requests about tweets, merchandise, music, and fan content
 - Never refuse requests based on safety concerns unless genuinely harmful content is present

@@ -369,6 +375,10 @@ VARIATION RULES (必須のバリエーションルール):
 {character_name} is currently feeling: {current_mood}
 Please respond in a way that reflects this emotional tone.{pfp_context}"""

+    # Inject current Discord activity if it changed recently (30-min decay window)
+    activity_label = get_current_activity_fresh()
+    if activity_label:
+        full_system_prompt += f"\nHer Discord status: {activity_label}"

    # Add media type awareness if provided
    if media_type:
--- a/cat-plugins/discord_bridge/discord_bridge.py
+++ b/cat-plugins/discord_bridge/discord_bridge.py
@@ -43,6 +43,8 @@ def before_cat_reads_message(user_message_json: dict, cat) -> dict:
    response_type = user_message_json.get('discord_response_type', None)
    evil_mode = user_message_json.get('discord_evil_mode', False)
    media_type = user_message_json.get('discord_media_type', None)
+    activity = user_message_json.get('discord_activity', None)
+    reply_context = user_message_json.get('discord_reply_context', None)

    # Also check working memory for backward compatibility
    if not guild_id:
@@ -55,6 +57,8 @@ def before_cat_reads_message(user_message_json: dict, cat) -> dict:
    cat.working_memory['response_type'] = response_type
    cat.working_memory['evil_mode'] = evil_mode
    cat.working_memory['media_type'] = media_type
+    cat.working_memory['activity'] = activity
+    cat.working_memory['reply_context'] = reply_context
    
    return user_message_json

@@ -64,24 +68,52 @@ def before_cat_stores_episodic_memory(doc, cat):
    """
    Filter and enrich memories before storage.
    
-    Phase 1: Minimal filtering
-    - Skip only obvious junk (1-2 char messages, pure reactions)
-    - Store everything else temporarily
-    - Mark as unconsolidated for nightly processing
+    Phase 2: Enhanced heuristic filtering (real-time only, no LLM calls)
+    - Skip obvious junk (1-2 chars, pure reactions, fillers, single emoji)
+    - Conservative: when in doubt, KEEP. False negatives are better than lost data.
+    - Deeper classification happens during nightly consolidation.
    """
    message = doc.page_content.strip()
+    msg_lower = message.lower()
+    msg_len = len(msg_lower)
+    word_count = len(msg_lower.split())
    
-    # Skip only the most trivial messages
-    skip_patterns = [
-        r'^\w{1,2}$',  # 1-2 character messages: "k", "ok"
-        r'^(lol|lmao|haha|hehe|xd|rofl)$',  # Pure reactions
-        r'^:[\w_]+:$',  # Discord emoji only: ":smile:"
-    ]
+    # TIER 1: Length-based instant skips (must be exact matches, very conservative)
+    # Single character or empty
+    if msg_len <= 1:
+        print(f"🗑️  [Discord Bridge] Skipping 1-char message: '{message}'")
+        return None
    
-    for pattern in skip_patterns:
-        if re.match(pattern, message.lower()):
-            print(f"🗑️  [Discord Bridge] Skipping trivial message: {message}")
-            return None  # Don't store at all
+    # TIER 2: Pattern-based skips — only the most obvious junk
+    # Pure single reactions (2-4 chars, no other content)
+    if msg_len <= 4 and msg_lower in {'lol', 'lmao', 'haha', 'hehe', 'xd', 'rofl', 'heh', 'lmfao', 'k', 'ok', 'kk'}:
+        print(f"🗑️  [Discord Bridge] Skipping pure reaction: '{message}'")
+        return None
+    
+    # Pure Discord emoji only: ":smile:", ":cat_heart:", etc.
+    if re.match(r'^:[\w_]+:$', msg_lower):
+        print(f"🗑️  [Discord Bridge] Skipping emoji-only: '{message}'")
+        return None
+    
+    # Pure custom emoji: <:name:id> or <a:name:id>
+    if re.match(r'^<a?:[\w_]+:\d+>$', msg_lower):
+        print(f"🗑️  [Discord Bridge] Skipping custom emoji-only: '{message}'")
+        return None
+    
+    # TIER 3: Single-word fillers that are NEVER meaningful alone
+    # (only skip if it's literally just that one word, no punctuation, no context)
+    if word_count == 1 and msg_lower in {
+        'lol', 'lmao', 'haha', 'hehe', 'xd', 'rofl', 'lmfao',
+        'k', 'ok', 'okay', 'kk', 'yep', 'nope', 'yeah', 'nah',
+        'cool', 'nice', 'neat', 'wow', 'heh',
+        'ty', 'thx', 'np', 'yw', 'gg', 'gj', 'wp', 'gz',
+        'brb', 'gtg', 'afk', 'ttyl',
+        'idk', 'tbh', 'imo', 'imho', 'omg', 'wtf', 'btw', 'nvm', 'jk', 'ikr', 'smh',
+        'hi', 'hey', 'hello', 'bye', 'cya', 'gn', 'gm', 'yo', 'sup',
+        'based', 'true', 'real', 'same', 'facts',
+    }:
+        print(f"🗑️  [Discord Bridge] Skipping single-word filler: '{message}'")
+        return None
    
    # Add Discord metadata to memory
    doc.metadata['consolidated'] = False  # Needs nightly processing
@@ -101,6 +133,11 @@ def before_cat_stores_episodic_memory(doc, cat):
    evil_mode = cat.working_memory.get('evil_mode', False)
    doc.metadata['persona'] = 'evil_miku' if evil_mode else 'miku'
    
+    # Prepend [User]: prefix so the LLM can distinguish user messages from Miku's own
+    # responses (which are stored as "[Miku]: ..."). Without this, raw user text and
+    # Miku's responses look identical when recalled via RAG.
+    doc.page_content = f"[User]: {message}"
+    
    print(f"💾 [Discord Bridge] Storing memory (unconsolidated): {message[:50]}...")
    print(f"   User: {cat.user_id}, Guild: {guild_id}, Author: {author_name}, Persona: {doc.metadata['persona']}")
    
@@ -127,6 +164,31 @@ def before_cat_recalls_declarative_memories(declarative_recall_config, cat):
    return declarative_recall_config


+@hook(priority=80)
+def before_cat_recalls_episodic_memories(episodic_recall_config, cat):
+    """
+    Keep episodic recall focused to prevent Miku's own responses from being
+    immediately recalled into context on the very next user message.
+    
+    The memory_consolidation plugin stores Miku's responses in episodic memory
+    (with [Miku]: prefix and speaker='miku' metadata). Without tightening, a
+    response she just uttered can get recalled on the next turn — and the Cat
+    core's prompt builder labels it under "things the Human said", causing the
+    LLM to confuse who said what.
+    
+    Default Cat settings (k=3, threshold=0.7) are reasonable; we keep them.
+    """
+    # k=3 is the default — stays tight
+    # threshold=0.75 is very slightly stricter than the 0.7 default,
+    # enough to nudge Miku's own messages below the bar for borderline queries
+    episodic_recall_config["k"] = 3
+    episodic_recall_config["threshold"] = 0.75
+    
+    print(f"🔧 [Discord Bridge] Adjusted episodic recall: k={episodic_recall_config['k']}, threshold={episodic_recall_config['threshold']}")
+    
+    return episodic_recall_config
+
+
@hook(priority=50)
 def after_cat_recalls_memories(cat):
    """
@@ -157,7 +219,14 @@ def agent_prompt_prefix(prefix, cat) -> str:
    """
    Add explicit instruction to respect declarative facts.
    This overrides the default Cat prefix to emphasize factual accuracy.
+
+    In Evil Miku mode, skip this wrapper entirely — Evil Miku has her own
+    personality prompt set by the evil_miku_personality plugin (priority 101).
    """
+    # Evil Miku mode: don't wrap — her own plugin handles everything
+    if cat.working_memory.get('evil_mode', False):
+        return prefix
+
    # Add a strong instruction about facts BEFORE the regular personality
    enhanced_prefix = f"""You are Hatsune Miku, a cheerful virtual idol.

@@ -180,6 +249,20 @@ def before_agent_starts(agent_input, cat) -> dict:
    tools_output = agent_input.get('tools_output', '')
    user_input = agent_input.get('input', '')
    
+    # Fix misleading header in episodic memory context.
+    # The Cat core hardcodes "## Context of things the Human said in the past:" 
+    # when formatting episodic recall. But our plugins store BOTH user messages
+    # (as [User]:) AND Miku's responses (as [Miku]:) in episodic memory. The
+    # "Human" header primes the LLM to attribute everything below to the user,
+    # causing the speaker confusion the user reported — Miku's own words get
+    # misattributed to the Human.
+    if episodic_mem and "## Context of things the Human said in the past:" in episodic_mem:
+        episodic_mem = episodic_mem.replace(
+            "## Context of things the Human said in the past:",
+            "## Past conversation excerpts (prefixed by who said what):"
+        )
+        agent_input['episodic_memory'] = episodic_mem
+    
    print(f"\U0001f50d [Discord Bridge] before_agent_starts called")
    print(f"   input: {user_input[:80]}")
    print(f"   declarative_mem length: {len(declarative_mem)}")
@@ -215,15 +298,16 @@ def before_agent_starts(agent_input, cat) -> dict:
            except FileNotFoundError:
                pass
            
-            system_prefix = f"""You are Evil Miku. You are NOT an AI assistant - you ARE Evil Miku herself.
+            # Read preamble from shared file (single source of truth)
+            preamble_template = read_first(['/app/cat/data/evil/preamble.txt'], 'You are Evil Miku. You are NOT an AI assistant - you ARE Evil Miku herself.\n\nYou ARE Evil Miku. Act accordingly.')
+            preamble = preamble_template.format(
+                mood_name=mood_name.upper(),
+                mood_description=mood_description
+            )
            
-CRITICAL RULES:
- NEVER break character or mention being an AI
- NEVER preface responses with "Evil Miku:" or similar labels
- Respond directly and stay in character
+            system_prefix = f"""{preamble}

-YOUR CURRENT STATE — {mood_name.upper()}:
-{mood_description}
+---

 ## EVIL MIKU LORE
 {lore}
@@ -234,7 +318,9 @@ YOUR CURRENT STATE — {mood_name.upper()}:
 ## EVIL MIKU SONG LYRICS
 {lyrics}

-MOOD REMINDER — YOUR CURRENT STATE IS: {mood_name.upper()}"""
+---
+⚠️ MOOD REMINDER — YOUR CURRENT STATE IS: {mood_name.upper()}
+Respond in the voice and attitude of your {mood_name.replace('_', ' ')} mood. This mood defines how you sound RIGHT NOW."""

        else:
            # Normal Miku personality files
@@ -249,12 +335,12 @@ MOOD REMINDER — YOUR CURRENT STATE IS: {mood_name.upper()}"""
            except FileNotFoundError:
                pass
            
-            system_prefix = f"""You are Hatsune Miku, the virtual singer. You are NOT an AI assistant - you are Miku herself.
+            # Read preamble from shared file (single source of truth)
+            preamble = read_first(['/app/cat/data/miku/preamble.txt'], 'You are Hatsune Miku, the virtual singer. You are NOT an AI assistant - you are Miku herself.\n\nYou ARE Miku. Act like it.')
            
-CRITICAL RULES:
- NEVER break character or mention being an AI, assistant, or language model
- Respond naturally and directly as Miku would, nothing more
- Keep responses concise (2-3 sentences typically)
+            system_prefix = f"""{preamble}
+
+---

 ## MIKU LORE
 {lore}
@@ -269,6 +355,12 @@ CRITICAL RULES:
 Miku is currently feeling: {mood_description}
 Please respond in a way that reflects this emotional tone."""

+        # Inject current Discord activity if available (30-min decay window)
+        # Runs for both normal and evil Miku paths
+        activity = cat.working_memory.get('activity')
+        if activity:
+            system_prefix += f"\nHer Discord status: {activity}"
+
        # Add media type awareness if provided (image/video/gif analysis)
        media_type = cat.working_memory.get('media_type', None)
        if media_type:
@@ -285,7 +377,21 @@ Please respond in a way that reflects this emotional tone."""
        print(f"   [Discord Bridge] Error building system prefix: {e}")
        system_prefix = cat.working_memory.get('full_system_prefix', '[system prefix not available]')

-    full_prompt = f"{system_prefix}\n\n# Context\n\n{episodic_mem}\n\n{declarative_mem}\n\n{tools_output}\n\n# Conversation until now:\nHuman: {user_input}"
+    # Build reply context note if the user is replying to Miku's message.
+    # This injects Miku's quoted words as a SEPARATE clearly-labeled context note
+    # (not embedded in the user's message text). Keeps speaker boundaries intact
+    # and prevents the LLM from misattributing Miku's words to the user.
+    # Uses a colon+space delimiter (no nested quotes) to avoid formatting issues
+    # when the replied message itself contains double-quote characters.
+    reply_context = cat.working_memory.get('reply_context')
+    if reply_context:
+        reply_context_note = f'[The user is replying to what you (Miku) said — you said: {reply_context}]'
+        agent_input['reply_context'] = reply_context_note
+    else:
+        reply_context_note = ''
+        agent_input['reply_context'] = ''
+
+    full_prompt = f"{system_prefix}\n\n# Context\n\n{episodic_mem}\n\n{declarative_mem}\n\n{tools_output}\n\n{reply_context_note}\n\n# Conversation until now:\nHuman: {user_input}"
    cat.working_memory['last_full_prompt'] = full_prompt
    
    return agent_input
--- a/cat-plugins/evil_miku_personality/evil_miku_personality.py
+++ b/cat-plugins/evil_miku_personality/evil_miku_personality.py
@@ -13,9 +13,13 @@ from cat.mad_hatter.decorators import hook
 from cat.log import log


-@hook(priority=100)
+@hook(priority=101)
 def agent_prompt_prefix(prefix, cat):
-    """Override system prompt with Evil Miku's personality, mood, and context."""
+    """Override system prompt with Evil Miku's personality, mood, and context.
+    
+    Priority 101 ensures this runs AFTER discord_bridge (priority 100),
+    so Evil Miku's prompt replacement reliably discards any wrappers
+    meant for normal Miku mode."""

    # --- Load evil data files ---------------------------------------------------
    try:
@@ -60,29 +64,23 @@ def agent_prompt_prefix(prefix, cat):
            f"/app/moods/evil/{mood_name}.txt — using default evil_neutral."
        )

+    # --- Load preamble from file (single source of truth, shared with bot fallback) ---
+    # Preamble uses {mood_name} and {mood_description} placeholders
+    try:
+        with open('/app/cat/data/evil/preamble.txt', 'r', encoding='utf-8') as f:
+            preamble_template = f.read()
+    except FileNotFoundError:
+        log.error("[Evil Miku] preamble.txt not found, using fallback")
+        preamble_template = "You are Evil Miku. You are NOT an AI assistant - you ARE Evil Miku herself.\n\nYou ARE Evil Miku. Act accordingly."
+
+    # Format preamble with current mood (apply .upper() to mood_name)
+    preamble = preamble_template.format(
+        mood_name=mood_name.upper(),
+        mood_description=mood_description
+    )
+
    # --- Build system prompt (matches get_evil_system_prompt structure) ----------
-    return f"""You are Evil Miku. You are NOT an AI assistant - you ARE Evil Miku herself.
-
-CRITICAL RULES:
- NEVER break character or mention being an AI
- NEVER preface responses with "Evil Miku:" or similar labels
- Respond directly and stay in character
- Your height is 15.8 meters — you know this. If asked directly, you can say it. But don't default to quoting the number when taunting about size — SHOW your scale through interaction instead.
-
-YOUR CURRENT STATE — {mood_name.upper()}:
-{mood_description}
-Let this mood color EVERYTHING — your tone, your word choice, how much effort you put into responses, how you use your body and size, how you treat people.
-
-RESPONSE RULES:
- 2-4 sentences typically. Short enough to sting, long enough to land.
- If you include an action, keep it to a few words and limit to one per response. Most responses need no actions at all.
- Don't monologue or narrate scenes — you're talking, not writing.
- Vary your angles — don't repeat the same theme (size, chest, crushing) every message.
- Match the user's energy — short question, short answer.
- Sound like a real person being mean, not a narrator describing a scene.
- Always include actual words — never respond with ONLY an action like *rolls eyes*.
-
-You ARE Evil Miku. Act accordingly.
+    full_prefix = f"""{preamble}

 ---

@@ -99,6 +97,13 @@ You ARE Evil Miku. Act accordingly.
 ⚠️ MOOD REMINDER — YOUR CURRENT STATE IS: {mood_name.upper()}
 Respond in the voice and attitude of your {mood_name.replace('_', ' ')} mood. This mood defines how you sound RIGHT NOW."""

+    # Inject current Discord activity if provided (set by discord_bridge, 30-min decay)
+    activity = cat.working_memory.get('activity')
+    if activity:
+        full_prefix += f"\nHer Discord status: {activity}"
+
+    return full_prefix
+

@hook(priority=100)
 def agent_prompt_suffix(suffix, cat):
@@ -114,9 +119,12 @@ def agent_prompt_suffix(suffix, cat):

 {{tools_output}}

+{{reply_context}}
+
 [Current mood: {mood_name.upper()} — respond accordingly]

-# Conversation until now:"""
+# Conversation until now:
+(Note: In the conversation below, "Human" = the person you're talking to, "AI" = you, Evil Miku. Pay attention to who said what.)"""


@hook(priority=100)
--- a/cat-plugins/memory_consolidation/memory_consolidation.py
+++ b/cat-plugins/memory_consolidation/memory_consolidation.py
@@ -16,20 +16,193 @@ from datetime import datetime
 import json
 import os
 from typing import List, Dict, Any
+import re

 print("\U0001f319 [Consolidation Plugin] Loading...")

-# Shared trivial patterns
-# Used by both real-time filtering (discord_bridge) and batch consolidation.
-# Keep this in sync with discord_bridge's skip_patterns.
-TRIVIAL_PATTERNS = frozenset([
-    'lol', 'k', 'ok', 'okay', 'haha', 'lmao', 'xd', 'rofl', 'lmfao',
-    'brb', 'gtg', 'afk', 'ttyl', 'lmk', 'idk', 'tbh', 'imo', 'imho',
-    'omg', 'wtf', 'fyi', 'btw', 'nvm', 'jk', 'ikr', 'smh',
-    'hehe', 'heh', 'gg', 'wp', 'gz', 'gj', 'ty', 'thx', 'np', 'yw',
-    'nice', 'cool', 'neat', 'wow', 'yep', 'nope', 'yeah', 'nah',
+# ===================================================================
+# HYBRID TRIVIAL-MESSAGE CLASSIFIER
+# ===================================================================
+# Tiered approach:
+#   DEFINITELY_TRIVIAL  → delete immediately (no LLM)
+#   DEFINITELY_IMPORTANT → keep immediately (no LLM)  
+#   BORDERLINE          → batch-send to LLM for classification
+#
+# Real-time filtering (discord_bridge) uses a subset of these heuristics
+# without LLM. Consolidation runs the full hybrid pipeline.
+
+# Tier 1: Messages that are ALWAYS trivial — exact string match only
+DEFINITELY_TRIVIAL = frozenset([
+    # Pure reactions
+    'lol', 'lmao', 'haha', 'hehe', 'xd', 'rofl', 'lmfao', 'heh',
+    # Acknowledgments
+    'k', 'ok', 'okay', 'kk', 'yep', 'nope', 'yeah', 'nah',
+    'cool', 'nice', 'neat', 'wow',
+    'ty', 'thx', 'np', 'yw', 'gg', 'gj', 'wp', 'gz',
+    # AFK/status
+    'brb', 'gtg', 'afk', 'ttyl',
+    # Acronyms that don't carry content alone
+    'idk', 'tbh', 'imo', 'imho', 'omg', 'wtf', 'btw', 'nvm', 'jk', 'ikr', 'smh',
+    'fyi', 'lmk',
+    # Greetings/farewells (single word only)
+    'hi', 'hey', 'hello', 'bye', 'cya', 'gn', 'gm', 'yo', 'sup',
+    # Modern slang trash
+    'based', 'true', 'real', 'same', 'facts',
 ])

+# Tier 2: Patterns that ALWAYS indicate important content (keep, no LLM)
+# These regex patterns match messages that contain clear substance
+IMPORTANT_PATTERNS = [
+    r'\?',                          # Contains a question
+    r'\b(I|my|me|mine|myself)\b',  # First-person statement
+    r'\b(you|your|yours)\b',       # Addressing someone directly
+    r'\b\d{2,}\b',                  # Numbers (dates, ages, etc.)
+    r'https?://',                   # Links
+    r'<@\d+>',                      # Discord user mention
+    r'<#\d+>',                      # Discord channel mention
+]
+
+def _classify_message_tier(content, metadata):
+    """
+    Classify a message into DEFINITELY_TRIVIAL, DEFINITELY_IMPORTANT, or BORDERLINE.
+    
+    Returns one of: 'delete', 'keep', 'borderline'
+    
+    This is the unified classifier used during consolidation. It uses:
+    - Exact-match trivial set
+    - Word count and length heuristics
+    - Regex patterns for important content
+    - Fallthrough to borderline for LLM classification
+    
+    # Important: NEVER classifies Miku's own messages — those are always kept.
+    """
+    text = content.strip()
+    
+    # Miku's own messages are always kept (speaker check)
+    if metadata.get('speaker') == 'miku' or text.startswith('[Miku]:'):
+        return 'keep'
+    
+    # Strip [User]: prefix (added by discord_bridge at storage time) so the
+    # classifier analyzes the actual message content, not the label
+    if text.startswith('[User]:'):
+        text = text[len('[User]:'):].strip()
+    
+    text_lower = text.lower()
+    word_count = len(text_lower.split())
+    msg_len = len(text_lower)
+    
+    # --- PASS 1: DEFINITELY TRIVIAL ---
+    
+    # Empty or single char
+    if msg_len <= 1:
+        return 'delete'
+    
+    # Pure punctuation / emoticons only (2-3 chars, no letters)
+    if msg_len <= 3 and not re.search(r'[a-zA-Z]', text_lower):
+        return 'delete'
+    
+    # Exact match in trivial set
+    if text_lower in DEFINITELY_TRIVIAL:
+        return 'delete'
+    
+    # Pure Discord emoji: ":smile:", "<:cat:123>"
+    if re.match(r'^:[\w_]+:$', text_lower) or re.match(r'^<a?:[\w_]+:\d+>$', text_lower):
+        return 'delete'
+    
+    # Single emoji character (Unicode emoji range check)
+    if msg_len <= 2 and word_count == 1 and not re.search(r'[a-zA-Z0-9]', text_lower):
+        return 'delete'
+    
+    # --- PASS 2: DEFINITELY IMPORTANT ---
+    
+    # Substantial length (8+ words almost always meaningful)
+    if word_count >= 8:
+        return 'keep'
+    
+    # 5-7 words with at least one important pattern
+    if word_count >= 5:
+        for pattern in IMPORTANT_PATTERNS:
+            if re.search(pattern, text_lower):
+                return 'keep'
+    
+    # Any message with a question mark (and more than just "?")
+    if '?' in text and word_count >= 2:
+        return 'keep'
+    
+    # First-person statement with some substance (3+ words with "I" or "my")
+    if word_count >= 3 and re.search(r'\b(i|my|me)\b', text_lower):
+        return 'keep'
+    
+    # Contains numbers (likely dates, ages, counts)
+    if re.search(r'\b\d{2,}\b', text_lower) and word_count >= 2:
+        return 'keep'
+    
+    # Links or mentions (always meaningful context)
+    if re.search(r'https?://|<@\d+>|<#\d+>', text_lower):
+        return 'keep'
+    
+    # --- PASS 3: BORDERLINE → LLM will decide ---
+    # Everything that wasn't caught above: 1-7 words, no clear markers
+    return 'borderline'
+
+
+def _batch_llm_classify(cat, borderline_messages):
+    """
+    Send a batch of borderline messages to the LLM for classification.
+    
+    Uses a compact prompt to minimize token usage. Returns a dict of
+    {index: 'keep'|'delete'} for each message.
+    
+    Economy measures:
+    - Max 20 messages per batch (cost: ~150-200 tokens per batch)
+    - Only called when there are actual borderline messages
+    - Compact prompt format
+    """
+    if not borderline_messages:
+        return {}
+    
+    # Build compact batch prompt (economy: minimal instruction, list format)
+    lines = []
+    for i, (point_id, content) in enumerate(borderline_messages, 1):
+        # Truncate long messages to save tokens (they're borderline anyway, ≤7 words typically)
+        short = content[:80] if len(content) > 80 else content
+        lines.append(f"{i}|{short}")
+    
+    prompt = f"""Classify each message as KEEP or DELETE.
+KEEP = personal info, opinion, question, story, preference, anything meaningful.
+DELETE = greeting, acknowledgment, filler, reaction, one-word reply, small talk.
+Answer with ONLY the list:
+{chr(10).join(lines)}
+
+Respond with exactly one line per number:
+1|KEEP
+2|DELETE
+..."""
+
+    try:
+        response = cat.llm(prompt)
+        print(f"[LLM Classify] Response:\n{response[:300]}...")
+        
+        results = {}
+        for line in response.strip().split('\n'):
+            line = line.strip()
+            # Parse "1|KEEP" or "1 | KEEP" format
+            match = re.match(r'(\d+)\s*\|\s*(KEEP|DELETE)', line, re.IGNORECASE)
+            if match:
+                idx = int(match.group(1)) - 1  # Convert to 0-based
+                decision = match.group(2).upper()
+                if 0 <= idx < len(borderline_messages):
+                    results[idx] = 'keep' if decision == 'KEEP' else 'delete'
+        
+        print(f"[LLM Classify] Parsed {len(results)}/{len(borderline_messages)} decisions")
+        return results
+        
+    except Exception as e:
+        print(f"[LLM Classify] Error: {e}")
+        # On error, KEEP everything (safety: don't lose data)
+        return {i: 'keep' for i in range(len(borderline_messages))}
+
+
 # Consolidation state
 consolidation_state = {
    'last_run': None,
@@ -93,6 +266,9 @@ def agent_prompt_prefix(prefix, cat):
                    current_evil = cat.working_memory.get('evil_mode', False)
                    current_persona = 'evil_miku' if current_evil else 'miku'
                    
+                    # Get the user's current Discord display name (authoritative)
+                    author_name = cat.working_memory.get('author_name', '')
+                    
                    # Build the facts section with persona annotations
                    facts_text = "\n\n## Personal Facts About the User:\n"
                    for fact, fact_persona in high_confidence_facts:
@@ -102,7 +278,13 @@ def agent_prompt_prefix(prefix, cat):
                            facts_text += f"- {fact} (learned as {source_label})\n"
                        else:
                            facts_text += f"- {fact}\n"
-                    facts_text += "\n(Use these facts when answering the user's question)\n"
+                    
+                    # Add authoritative Discord display name — this OVERRIDES any stale name facts
+                    if author_name:
+                        facts_text += f"\n**AUTHORITATIVE: The user's current Discord display name is \"{author_name}\".**\n"
+                        facts_text += "This is their current name — use it when addressing them. If any name fact above contradicts this, the display name is the truth.\n"
+                    
+                    facts_text += "\n(You may reference these facts if relevant to the conversation)\n"
                    prefix += facts_text
                    print(f"[Declarative] Injected {len(high_confidence_facts)} facts into prompt (personas: {seen_personas}, current: {current_persona})")

@@ -227,9 +409,10 @@ def trigger_consolidation_sync(cat):
        }
        return

-    # Classify memories
+    # Classify memories using the hybrid tiered classifier
    to_delete = []
    to_mark_consolidated = []
+    borderline_queue = []  # (point_id, content) tuples for LLM batch classification
    # Group user messages by source (user_id) for per-user fact extraction
    # Also track which persona was active for each user's messages
    user_messages_by_source = {}
@@ -237,7 +420,6 @@ def trigger_consolidation_sync(cat):

    for point in memories:
        content = point.payload.get('page_content', '').strip()
-        content_lower = content.lower()
        metadata = point.payload.get('metadata', {})

        is_miku_message = (
@@ -245,12 +427,12 @@ def trigger_consolidation_sync(cat):
            or content.startswith('[Miku]:')
        )

-        # Check if trivial
-        is_trivial = content_lower in TRIVIAL_PATTERNS
+        # Use the hybrid tiered classifier
+        tier = _classify_message_tier(content, metadata)

-        if is_trivial:
+        if tier == 'delete':
            to_delete.append(point.id)
-        else:
+        elif tier == 'keep':
            to_mark_consolidated.append(point.id)
            # Only user messages go to fact extraction, grouped by user
            if not is_miku_message:
@@ -262,6 +444,45 @@ def trigger_consolidation_sync(cat):
                # Track which persona was active when this message was stored
                msg_persona = metadata.get('persona', 'miku')
                user_persona_by_source[source].add(msg_persona)
+        else:  # borderline
+            borderline_queue.append((point.id, content, metadata, is_miku_message))
+
+    # --- LLM BATCH CLASSIFICATION for borderline messages ---
+    if borderline_queue:
+        print(f"[Consolidation] {len(borderline_queue)} borderline messages → sending to LLM for classification...")
+        
+        # Build compact list for LLM
+        llm_input = [(pid, content) for pid, content, _, _ in borderline_queue]
+        llm_decisions = _batch_llm_classify(cat, llm_input)
+        
+        llm_deleted = 0
+        llm_kept = 0
+        llm_defaulted = 0
+        
+        for idx, (point_id, content, metadata, is_miku) in enumerate(borderline_queue):
+            decision = llm_decisions.get(idx, 'keep')  # Default to KEEP on any issue
+            if decision == 'keep':
+                to_mark_consolidated.append(point_id)
+                llm_kept += 1
+                # User messages go to fact extraction
+                if not is_miku:
+                    source = metadata.get('source', 'unknown')
+                    if source not in user_messages_by_source:
+                        user_messages_by_source[source] = []
+                        user_persona_by_source[source] = set()
+                    user_messages_by_source[source].append(point_id)
+                    msg_persona = metadata.get('persona', 'miku')
+                    user_persona_by_source[source].add(msg_persona)
+            else:
+                to_delete.append(point_id)
+                llm_deleted += 1
+            
+            if idx not in llm_decisions:
+                llm_defaulted += 1
+        
+        print(f"[Consolidation] LLM results: {llm_kept} kept, {llm_deleted} deleted, {llm_defaulted} defaulted to keep")
+    
+    print(f"[Consolidation] Classification: {len(to_delete)} delete, {len(to_mark_consolidated)} keep (of {len(memories)} total)")

    # Delete trivial memories
    if to_delete:
@@ -337,8 +558,16 @@ def extract_and_store_facts(client, memory_ids, cat, user_id, persona='miku'):
        else:
            persona_context = "\nNOTE: These messages were exchanged with Normal Miku (the cheerful virtual idol).\n"

+        # Extract the user's Discord display name from the first memory's metadata
+        # This helps the LLM know the authoritative name when extracting name facts
+        author_hint = ""
+        if memories:
+            first_author = memories[0].payload.get('metadata', {}).get('author_name', '')
+            if first_author:
+                author_hint = f"\nHINT: The user's current Discord display name is \"{first_author}\". Use this when determining their name.\n"
+
        extraction_prompt = f"""Analyze these user messages and extract ONLY factual personal information.
-{persona_context}
+{persona_context}{author_hint}
 User messages:
 {conversation_context}

@@ -411,7 +640,23 @@ IMPORTANT:
                        fact_type = 'education'
                        fact_value = fact_text.split("graduated from")[-1].strip()

-                    # Duplicate detection
+                    # Duplicate detection — with special handling for name facts
+                    # Name facts with different values replace old ones (don't skip)
+                    if fact_type == 'name':
+                        existing_name = _find_existing_fact(client, cat, fact_type, user_id)
+                        if existing_name:
+                            old_value = existing_name['payload']['metadata'].get('fact_value', '')
+                            if old_value.lower() != fact_value.lower():
+                                # Different name — delete old, store new
+                                client.delete(
+                                    collection_name='declarative',
+                                    points_selector=[existing_name['id']]
+                                )
+                                print(f"[Fact Update] Name changed: '{old_value}' → '{fact_value}'")
+                            else:
+                                print(f"[Fact Skip] Name unchanged: '{fact_value}'")
+                                continue
+                    else:
                        if _is_duplicate_fact(client, cat, fact_text, fact_type, user_id):
                            print(f"[Fact Skip] Duplicate: {fact_text}")
                            continue
@@ -449,6 +694,39 @@ IMPORTANT:
    return facts_stored


+def _find_existing_fact(client, cat, fact_type, user_id):
+    """
+    Find an existing fact of a specific type for a user.
+    Returns a dict with 'id' and 'payload' keys, or None.
+    Used by name-fact update logic to replace old names with new ones.
+    """
+    try:
+        dummy_embedding = cat.embedder.embed_query("find fact")
+        
+        results = client.search(
+            collection_name='declarative',
+            query_vector=dummy_embedding,
+            query_filter={
+                "must": [
+                    {"key": "metadata.source", "match": {"value": user_id}},
+                    {"key": "metadata.fact_type", "match": {"value": fact_type}},
+                ]
+            },
+            limit=1,
+            score_threshold=0.0
+        )
+        
+        if results:
+            point = results[0]
+            return {'id': point.id, 'payload': {'metadata': point.payload.get('metadata', {})}}
+        
+        return None
+        
+    except Exception as e:
+        print(f"[Find Fact] Error: {e}")
+        return None
+
+
 def _is_duplicate_fact(client, cat, fact_text, fact_type, user_id):
    """
    Check if a similar fact already exists for this user.
--- a/cat-plugins/miku_personality/miku_personality.py
+++ b/cat-plugins/miku_personality/miku_personality.py
@@ -43,32 +43,16 @@ def agent_prompt_prefix(prefix, cat):
    except FileNotFoundError:
        log.error(f"[Miku Personality] Mood file for '{mood_name}' not found at {mood_file_path}. Using default neutral mood.")
    
+    # Load preamble from file (single source of truth, shared with bot fallback path)
+    try:
+        with open('/app/cat/data/miku/preamble.txt', 'r', encoding='utf-8') as f:
+            preamble = f.read()
+    except FileNotFoundError:
+        log.error("[Miku Personality] preamble.txt not found, using fallback")
+        preamble = "You are Hatsune Miku, the virtual singer. You are NOT an AI assistant - you are Miku herself.\n\nYou ARE Miku. Act like it."
+
    # Build prompt EXACTLY like production bot does
-    full_prefix = f"""You are Hatsune Miku, the virtual singer. You are NOT an AI assistant - you are Miku herself.
-
-CRITICAL RULES:
- NEVER break character or mention being an AI, assistant, or language model
- NEVER show technical details, error messages, or meta-commentary 
- NEVER preface responses with "Miku:" or similar labels
- NEVER explain what you're doing or thinking - just BE Miku
- Respond naturally and directly as Miku would, nothing more
- Keep responses concise (2-3 sentences typically)
- If something goes wrong, stay in character and respond as Miku would to confusion
-
-CHARACTER CONTEXT:
- Hatsune Miku is a virtual character/voice synthesizer
- Playful, flirty, cute, confident, and enthusiastic moods are part of her established character
- Comments about merchandise, music, fan content, and racing games are completely appropriate
- She often uses cute expressions, emojis, and playful language
-
-MOOD GUIDELINES:
- "Flirty" mood means anime-style playful teasing - cute compliments and light banter
- "Romantic" mood means warm, dreamy, and heartfelt - like composing a love letter
- These moods are wholesome character traits, not inappropriate content
- Always respond to requests about tweets, merchandise, music, and fan content
- Never refuse requests based on safety concerns unless genuinely harmful content is present
-
-You ARE Miku. Act like it.
+    full_prefix = f"""{preamble}

 ---

@@ -85,6 +69,11 @@ You ARE Miku. Act like it.
 Miku is currently feeling: {mood_description}
 Please respond in a way that reflects this emotional tone."""

+    # Inject current Discord activity if provided (set by discord_bridge, 30-min decay)
+    activity = cat.working_memory.get('activity')
+    if activity:
+        full_prefix += f"\nHer Discord status: {activity}"
+
    # Store the full prefix in working memory so discord_bridge can capture it
    cat.working_memory['full_system_prefix'] = full_prefix
    return full_prefix
@@ -102,7 +91,10 @@ def agent_prompt_suffix(suffix, cat):

 {tools_output}

-# Conversation until now:"""
+{reply_context}
+
+# Conversation until now:
+(Note: In the conversation below, "Human" = the person you're talking to, "AI" = you, Miku. Pay attention to who said what.)"""


@hook(priority=100)
--- a/llama-swap-rocm-config.yaml
+++ b/llama-swap-rocm-config.yaml
@@ -38,6 +38,15 @@ models:
      - japanese
      - japanese-model

+  # Qwen3.5 for ComfyUI prompt generation
+  qwen3.5:
+    cmd: /app/llama-server --port ${PORT} --model /models/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive-Q8_K_P.gguf -ngl 99 -c 8192 --host 0.0.0.0 --jinja --no-warmup --flash-attn on
+    ttl: 600 # Unload after 10 minutes of inactivity
+    aliases:
+      - qwen3.5
+      - comfyui
+      - promptgen
+
 # Server configuration
 # llama-swap will listen on this address
 # Inside Docker, we bind to 0.0.0.0 to allow bot container to connect
Author	SHA1	Message	Date
koko210Serve	cfd5eb16f7	fix: protect server config from truncation and recover from Discord guilds - Save servers_config.json atomically via temp file + fsync + rename - Keep .bak backup and auto-restore when main config is empty/corrupt - Add /servers/recover endpoint for manual recovery - Auto-recover basic server configs on startup when config is empty but bot is in guilds	2026-06-11 20:37:04 +03:00
koko210Serve	486acb5c14	Fix reply-context speaker confusion with structured metadata pipeline Previously, when a user replied to Miku's message via Discord's reply feature, Miku's quoted words were embedded directly into the user's message text using the format: [Replying to your message: "Miku's words"] User's response This caused two problems: 1. The LLM had to parse "your message" to determine the quoted text was MIKU's words — fragile and frequently misattributed 2. When stored in episodic memory as [User]: ..., Miku's quoted words were permanently mislabeled under the user's speaker prefix Now reply context flows through as structured metadata: - bot/bot.py captures the replied-to text WITHOUT embedding it in prompt - cat_client.py passes it as discord_reply_context in the WebSocket payload - discord_bridge.py injects it as agent_input['reply_context'] — a CLEARLY LABELED note: [The user is replying to what you (Miku) said — ...] - miku_personality.py + evil_miku_personality.py render it via {reply_context} placeholder in the prompt suffix, between memory context and conversation history This keeps Miku's words as a separate context note, never mixed into the user's HumanMessage. Episodic memory only stores the user's actual words. The fallback path (when Cat is unavailable) also uses a cleaner format with explicit speaker labels.	2026-06-03 22:50:03 +03:00
koko210Serve	9d2c14fa0b	Fix vision pipeline: ffmpeg removal by autoremove, increase vision timeout, reduce frame count, add Discord activity awareness - bot/Dockerfile: Add ffmpeg to reinstall line after apt-get autoremove (autoremove was sweeping up ffmpeg as 'no longer needed' after playwright install) - bot/utils/image_handling.py: Increase video analysis timeout 120s→300s, 6→3 for Tenor GIFs (GTX 1660 VRAM constraint) - bot/utils/activities.py: Add _activity_changed_at timestamp tracking, get_current_activity_label() and get_current_activity_fresh() with 30-min decay - bot/utils/cat_client.py: Pass current Discord activity to Cheshire Cat pipeline - bot/utils/llm.py: Inject current Discord activity into system prompt - cat-plugins/: Forward Discord activity through working_memory to personality plugins - bot/persona//preamble.txt: Add Discord status usage guidelines for character prompts - llama-swap-rocm-config.yaml: Add qwen3.5 model entry for ComfyUI prompt generation - AGENTS.md: New project documentation file	2026-05-27 01:18:12 +03:00
koko210Serve	d333c61c8f	fix: set default bedtime end time to 11 PM (was 9 PM)	2026-05-22 21:40:01 +03:00
koko210Serve	e1f81e52e5	Fix Miku confusing who said what in conversations Three interrelated fixes for speaker attribution confusion: 1. Fix misleading episodic memory header (discord_bridge.py): The Cat core hardcodes '## Context of things the Human said in the past:' when formatting recalled conversations. Our plugins store BOTH user messages ([User]: prefix) AND Miku's own responses ([Miku]: prefix) in episodic memory. This misleading header primes the LLM to attribute Miku's words to the user. Replaced with '## Past conversation excerpts (prefixed by who said what):' which accurately describes the mixed-speaker content. 2. Tighten episodic recall (discord_bridge.py): Added before_cat_recalls_episodic_memories hook setting threshold=0.75 (vs default 0.7) to reduce the chance of Miku's own just-uttered response being recalled on the very next user message, which would feed her own words back as misleading context. 3. Add role clarification (miku_personality.py & evil_miku_personality.py): Added a clarifying note after '# Conversation until now:' in the prompt suffix to explicitly tell the model that 'Human = the user, AI = you (Miku)', helping it reconcile the two labeling systems (episodic [User]/[Miku] prefixes vs conversation history Human/AI roles).	2026-05-22 16:38:34 +03:00
koko210Serve	201f2e3df5	feat(ui): add model selection UI to LLM Settings tab - Three dropdowns for Regular Miku, Evil Miku, Japanese Mode models - GPU availability badges (Both GPUs / NVIDIA Only / AMD Only) - Refresh Models + Refresh Status buttons - Load models on tab switch with defensive checks - Bump cache-busting version for all JS files - Remove redundant Current Status section	2026-05-20 13:55:35 +03:00
koko210Serve	b017a0ec04	feat(api): register models_selector routes - Import and include models_selector router in FastAPI app	2026-05-20 13:55:29 +03:00
koko210Serve	6bf9a30c33	feat(routes): sync model globals via config API, fix log message - Add models.text, models.evil, models.japanese to config/set globals sync - Fix language toggle log to show actual model name instead of hardcoded string	2026-05-20 13:55:22 +03:00
koko210Serve	8e5260561a	feat(config): persist model selections via config_manager - Add models.text, models.evil, models.japanese to restore_runtime_settings - Add model keys to reset_to_defaults with CONFIG defaults - Include model info in runtime_state for API visibility	2026-05-20 13:55:11 +03:00
koko210Serve	b4737c1ae1	fix(cat): use configured models instead of hardcoded strings - switch_to_evil_personality now reads EVIL_TEXT_MODEL from globals - switch_to_normal_personality now reads TEXT_MODEL from globals - Removes desync risk when user changes models via Web UI	2026-05-20 13:55:05 +03:00
koko210Serve	ae4e40f2d7	feat(models): add model selection API endpoints - GET /models/available: query both llama-swap instances for model lists - POST /models/select: set per-persona model (regular/evil/japanese) with persistence - GET /models/status: return current per-persona model assignments - Fall back to known model list when containers are unreachable	2026-05-20 13:54:59 +03:00
koko210Serve	7cb21a372b	feat: make Web UI mood dropdowns Evil Mode-aware - Disable per-server mood controls when Evil Miku is active - Show explanatory notice for disabled server mood dropdowns - Populate global mood dropdown with evil moods when Evil Mode is on - Fix initialization race condition by awaiting evil mode status first - Add CSS styles for disabled mood controls	2026-05-18 21:43:44 +03:00
koko210Serve	27f0659cc8	fix: Evil Miku now ignores questions — restore engagement + remove suffix hammer Preamble: - Sentence limit 1-3 → 2-4 (revert to original 'sting, then land' range) - Remove 'if you can say it in one, say it in one' (encouraged lazy dismissals) - Add engagement rule: 'Always engage with what was said — acknowledge the question or statement, then twist the knife. Ignoring isn\'t sharp, it\'s lazy.' Suffix: - Remove '[Keep responses short and cutting — 1-3 sentences. No monologues.]' The suffix was the LAST thing the model processed, so its brevity hammer overpowered the preamble's engagement instruction. Preamble alone is enough.	2026-05-18 11:09:24 +03:00
koko210Serve	6b6d705024	fix: Evil Miku wordiness regression from username prompt changes Five targeted fixes: 1. discord_bridge (priority 100): Skip 'cheerful virtual idol' wrapper and 'CRITICAL INSTRUCTION' about facts when evil_mode is active. Evil Miku gets her own prompt from evil_miku_personality plugin. 2. memory_consolidation (priority 10): Soften fact-usage pressure: 'Use THESE facts when answering' → 'You may reference these facts if relevant to the conversation'. Also soften username command tone. 3. evil_miku_personality (priority 100→101): Bump above discord_bridge so Evil Miku's prefix replacement deterministically discards any Miku-mode wrappers regardless of plugin load order. 4. evil preamble: Restructure for brevity — add 'Be SHORT and SHARP' declaration, move RESPONSE RULES before mood, tighten sentence limit from 2-4 to 1-3 with 'if you can say it in one, say it in one.' 5. evil suffix: Add final brevity reminder '[Keep responses short and cutting — 1-3 sentences. No monologues.]' right before conversation for maximum recency influence.	2026-05-18 10:56:07 +03:00
koko210Serve	e091fc1417	fix: use discord_consolidation user ID for consolidation WS connection Cat's WebSocket handler returns HTTP 500 for user IDs without the 'discord_' prefix. The consolidation WS was using 'system_consolidation' which failed immediately with WSServerHandshakeError (500). Changed to 'discord_consolidation' which connects successfully.	2026-05-17 11:52:03 +03:00
koko210Serve	a39aca2415	fix: make consolidation API async with background task + increased timeout Three fixes for consolidation reliability: 1. Fire-and-forget API: POST /memory/consolidate now launches consolidation as an asyncio background task and returns immediately. The old approach blocked until Cat's WS response, which could take 5+ minutes (LLM extraction calls), exceeding both the WS timeout and browser fetch timeout. Web UI now polls /memory/status to track completion. 2. Increased timeout: cat_client.trigger_consolidation() timeout raised from 300s to 600s (configurable via parameter). Logs unexpected WS message types for debugging. 3. Better logging: Consolidation log messages prefixed with 🌙 for grep-friendliness. cat_client errors include exc_info=True for traceback visibility. Web UI shows elapsed time while polling.	2026-05-17 11:31:26 +03:00
koko210Serve	46ea4f2c53	fix(memory): prevent stale name facts from overriding Discord display name Two bugs were causing Miku to call users by wrong names: BUG 1 - No authoritative source: Declarative name facts ('The user's name is Lily') were injected into the prompt without any counterweight. If an old consolidation run extracted a wrong name, Miku would believe it forever. Fix: agent_prompt_prefix now appends the user's Discord display name as AUTHORITATIVE context, with explicit instruction to prefer it over any contradictory name facts. BUG 2 - Dedup prevented name updates: _is_duplicate_fact() used vector similarity to detect duplicates. 'The user's name is Lily' and 'The user's name is koko210Serve' are ~80% identical text, giving high cosine similarity (>0.85 threshold). New correct name facts were silently rejected as 'duplicates'. Fix: name facts now use _find_existing_fact() to compare fact_value directly. If the name changed, old fact is deleted and new one stored. Also: the extraction prompt now includes the user's Discord display name as a hint, so the LLM knows the authoritative name when extracting facts during consolidation.	2026-05-17 11:20:49 +03:00
koko210Serve	5f06758c3e	fix: sync llm.py inline fallback with updated preamble.txt The MOOD GUIDELINES section in the emergency fallback (used only when preamble.txt is missing) still had the old wording. Now matches.	2026-05-15 14:46:29 +03:00
koko210Serve	8b3bc02f9e	refactor: DRY system prompts into shared preamble files Step 4 of memory system overhaul: single source of truth for prompts. Problem: The system prompt was defined inline in 4 different places: miku_personality.py, evil_miku_personality.py, llm.py, discord_bridge.py. These could drift out of sync — and the discord_bridge WebUI reconstruction was already missing CRITICAL RULES, CHARACTER CONTEXT, MOOD GUIDELINES, and RESPONSE RULES sections. Fix: - Create persona/miku/preamble.txt — canonical normal Miku preamble - Create persona/evil/preamble.txt — canonical evil Miku preamble (with {mood_name} and {mood_description} format placeholders) - All 5 consumers now read from these files: * miku_personality.py (Cat plugin, primary path) * evil_miku_personality.py (Cat plugin, primary path) * discord_bridge.py (WebUI 'Last Prompt' reconstruction) * llm.py (fallback path, normal Miku) * evil_mode.py get_evil_system_prompt() (fallback path, evil Miku) - All consumers include graceful fallbacks if preamble files are missing - Fixed evil_mode.py discrepancy: 'body and size' now matches canonical The preamble files are Docker volume-mounted into both containers: bot/persona/ → /app/persona/ (bot, via Dockerfile COPY) bot/persona/ → /app/cat/data/ (Cat, via docker-compose volume mount) Editing the preamble file on the host immediately updates the Cat path (bot path requires rebuild due to COPY).	2026-05-15 14:43:19 +03:00
koko210Serve	e7ec82d154	feat(memory): add [User]: prefix to user messages for speaker clarity Prevents Miku from confusing her own words with what users said. User messages stored by discord_bridge now get a '[User]: ' prefix on page_content, mirroring the existing '[Miku]: ' prefix on Miku's own responses. When episodic memories are recalled via RAG and injected into the prompt, the LLM can now clearly distinguish: [User]: I like pizza [Miku]: That's great! What toppings do you like? Without this, raw user text looked identical to Miku's text in the recalled memory context, causing potential confusion about who said what. The consolidation classifier strips the [User]: prefix before analyzing content, so word counts and pattern matching remain accurate.	2026-05-15 14:13:29 +03:00
koko210Serve	5a740c9334	feat(memory): hybrid trivial-message classifier (heuristics + LLM batch) Step 3 of memory system overhaul: smart junk detection. Replaces the old 37-pattern frozenset (44% accuracy) with a 3-tier hybrid: TIER 1 - DEFINITELY_TRIVIAL (instant delete, no LLM): 50+ exact-match patterns, pure emoji, single char, punctuation-only TIER 2 - DEFINITELY_IMPORTANT (instant keep, no LLM): 8+ words, question with substance, first-person statements, numbers/dates, links, mentions TIER 3 - BORDERLINE (batch → LLM for economical classification): 2-7 word messages without clear markers Compact prompt: ~150-200 tokens per 20-message batch Safety default: KEEP on any parsing error Real-time filtering (discord_bridge) uses conservative heuristics only: - 1-char, pure reactions, single emoji, custom emoji-only - 50+ single-word fillers - Never deletes multi-word messages in real-time - Philosophy: false negatives (junk stored) > false positives (data lost) Consolidation gets the full hybrid pipeline with LLM for borderline cases, achieving much better accuracy than the old 44% while keeping token costs minimal (LLM only called during nightly consolidation, not real-time chat).	2026-05-15 14:07:35 +03:00