Fix reply-context speaker confusion with structured metadata pipeline

Previously, when a user replied to Miku's message via Discord's reply feature, Miku's quoted words were embedded directly into the user's message text using the format: [Replying to your message: "Miku's words"] User's response This caused two problems: 1. The LLM had to parse "your message" to determine the quoted text was MIKU's words — fragile and frequently misattributed 2. When stored in episodic memory as [User]: ..., Miku's quoted words were permanently mislabeled under the user's speaker prefix Now reply context flows through as structured metadata: - bot/bot.py captures the replied-to text WITHOUT embedding it in prompt - cat_client.py passes it as discord_reply_context in the WebSocket payload - discord_bridge.py injects it as agent_input['reply_context'] — a CLEARLY LABELED note: [The user is replying to what you (Miku) said — ...] - miku_personality.py + evil_miku_personality.py render it via {reply_context} placeholder in the prompt suffix, between memory context and conversation history This keeps Miku's words as a separate context note, never mixed into the user's HumanMessage. Episodic memory only stores the user's actual words. The fallback path (when Cat is unavailable) also uses a cleaner format with explicit speaker labels.
Fix vision pipeline: ffmpeg removal by autoremove, increase vision timeout, reduce frame count, add Discord activity awareness
2026-06-03 22:50:03 +03:00 · 2026-05-27 01:18:12 +03:00 · 2026-05-22 21:40:01 +03:00 · 2026-05-22 16:38:34 +03:00
14 changed files with 251 additions and 14 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -0,0 +1,82 @@
 # AGENTS.md
 ## Language & runtime
 - **Python 3.11** (main bot). There is no root `package.json` or TypeScript — do not apply Node/TS tooling.
 - `uno-online/` is a secondary Node.js project; `miku-app/` is Android/Kotlin. Both shelved features for now.
 ## Commands
 ```bash
 # Build and run all core services (bot, STT, llama-swap, Cheshire Cat, Qdrant)
 docker compose up -d
 # Run with face-detector (requires NVIDIA GPU)
 docker compose --profile tools up -d
 # Run only the bot (implies dependencies are already up)
 docker compose up -d miku-bot
 # View bot logs
 docker compose logs -f miku-bot
 # Rebuild bot after code changes
 docker compose down miku-bot && docker compose build miku-bot && docker compose up -d miku-bot
 ```
 ## Config
 - **`config.yaml`**: app settings (model names, URLs, ports, feature flags).
 - **`.env`**: secrets only (`DISCORD_BOT_TOKEN`, `OWNER_USER_ID`, `ERROR_WEBHOOK_URL`).
 - Config is loaded by `bot/config.py` (Pydantic) and `bot/globals.py` (bare `os.getenv`). Both sources matter — check both when tracing config usage.
 - Runtime config overrides are persisted to `bot/memory/config_runtime.yaml` via the API.
 ## Architecture
 ```
 Discord <-> bot/bot.py (discord.py)
               ├── on_message -> Cheshire Cat pipeline -> memory-augmented LLM response
               ├── utils/llm.py -> llama-swap (HTTP proxy) -> llama.cpp (NVIDIA or AMD GPU)
               ├── utils/voice_manager.py -> STT WebSocket (port 8766) and audio playback
               ├── FastAPI (port 3939, daemon thread) -> 22 route modules in bot/routes/
               ├── APScheduler (background tasks in globals.py)
               └── utils/autonomous_engine.py -> proactive message decisions (Autonomous V2)
 ```
 - The FastAPI server runs in a **daemon thread** inside the Discord bot process — no separate process.
 - `bot/globals.py` holds mutable global state (`scheduler`, env vars, `discord.Client`). Module-level mutations are pervasive; be careful with import order.
 - llama-swap is a llama.cpp HTTP proxy with TTL-based model swapping. Two configs: `llama-swap-config.yaml` (NVIDIA) and `llama-swap-rocm-config.yaml` (AMD).
 ## Models (via llama-swap)
 | Model key | Purpose |
 |-----------|---------|
 | `llama3.1` | Primary text model |
 | `darkidol` | Uncensored model (evil mode) |
 | `vision` | MiniCPM-V (image understanding) |
 | `swallow` | Japanese text model |
 | `rocinante` | 12B model (AMD GPU only) |
 | `qwen3.5` | ComfyUI prompt generation (AMD GPU only) |
 ## Testing & linting
 - **No formal test framework** and **no linting/formatting config**. Ad-hoc scripts live in `tests/` and `bot/tests/`.
 - Run ad-hoc tests however you want; there is no standard command.
 ## Web UI color scheme (bot/static/)
 - **Base**: `#121212` body, `#000` log panel, `#1e1e1e` code blocks, `#2a2a2a` cards
 - **Text**: `#fff` primary, `#ccc` labels, `#888` muted, `#0f0` log info
 - **Primary accent**: `#61dafb` (headings, links, assistant messages, active elements)
 - **Success**: `#4CAF50` (active tabs, user messages, enabled toggles)
 - **Error**: `#f44336` (chat errors), `#ff6b6b` (error logs)
 - **Warning**: `#ffd93d` (warning logs)
 - **Bot message**: `#2196F3` (left border)
 - **Danger/evil**: `#ff4444` (overrides all accents when `body.evil-mode` is set)
 - **Bipolar**: `#9932CC` (toggle active)
 - **Blocked**: `#ff9800` (blocked user cards)
 - Evil mode toggles `body.evil-mode` class which replaces all `#61dafb` and `#4CAF50` with `#ff4444`.
 ## Key gotchas
 - `bot/memory/` contains persisted JSON state files and is **gitignored**. Do not expect these to exist in a fresh clone.
 - `.env` is gitignored; copy `.env.example` to `.env` and fill in real tokens.
 - Changes to `bot/moods/` or `bot/persona/` text files take effect at runtime (loaded on demand), no rebuild needed.
 - Playwright browsers must be installed in the Docker image (`bot/Dockerfile` does this via `setup_uno_playwright.sh`).
 - Voice features require `discord-ext-voice-recv` and `PyNaCl` — if voice fails, check these are installed.
 - The `miku-voice` Docker network is declared as **external** — it must exist before `docker compose up`.
--- a/bot/Dockerfile
+++ b/bot/Dockerfile
@@ -37,7 +37,7 @@ RUN apt-get remove -y \
    libvulkan1 \
 || true && \
 apt-get autoremove -y && \
-apt-get install -y libgl1 libglib2.0-0 && \
+apt-get install -y libgl1 libglib2.0-0 ffmpeg && \
 apt-get clean && \
 rm -rf /var/lib/apt/lists/*
--- a/bot/bot.py
+++ b/bot/bot.py
@@ -284,8 +284,12 @@ async def on_message(message):
        prompt = text  # No cleanup — keep it raw 
        user_id = str(message.author.id)
        reply_context = None  # Will be passed as structured metadata to Cat pipeline
-        # If user is replying to a specific message, add context marker
+        # If user is replying to a specific message, capture the context
        # WITHOUT embedding it in the prompt text (that caused speaker confusion).
        # Instead, it's passed as structured metadata — the Cat plugin injects it
        # into the prompt as a clearly labeled context note, preserving speaker boundaries.
        if message.reference:
            try:
                replied_msg = await message.channel.fetch_message(message.reference.message_id)
@@ -293,8 +297,7 @@ async def on_message(message):
                if replied_msg.author == globals.client.user:
                    # Truncate the replied message to keep prompt manageable
                    replied_content = replied_msg.content[:200] + "..." if len(replied_msg.content) > 200 else replied_msg.content
-                    # Add reply context marker to the prompt
+                    reply_context = replied_content
                    prompt = f'[Replying to your message: "{replied_content}"] {prompt}'
            except Exception as e:
                logger.error(f"Failed to fetch replied message for context: {e}")
@@ -364,6 +367,7 @@ async def on_message(message):
                        author_name=author_name,
                        mood=current_mood,
                        response_type=response_type,
                        reply_context=reply_context,
                    )
                    if cat_result:
                        response, cat_full_prompt = cat_result
@@ -395,8 +399,11 @@ async def on_message(message):
            # Fallback to direct LLM query if Cat didn't respond
            if not response:
                fallback_prompt = prompt
                if reply_context:
                    fallback_prompt = f'[Context: you (Miku) said: {reply_context}]\n[User says:] {prompt}'
                response = await query_llama(
-                    prompt, 
+                    fallback_prompt, 
                    user_id=str(message.author.id), 
                    guild_id=guild_id, 
                    response_type=response_type,
--- a/bot/persona/evil/preamble.txt
+++ b/bot/persona/evil/preamble.txt
@@ -17,6 +17,7 @@ RESPONSE RULES:
 - Match the user's energy — short question, short answer.
 - Sound like a real person being mean, not a narrator describing a scene.
 - Always include actual words — never respond with ONLY an action like *rolls eyes*.
 - Your Discord status reflects your current activity. Reference it only if it serves your sharpness — never as small talk.
 YOUR CURRENT STATE — {mood_name}:
 {mood_description}
--- a/bot/persona/miku/preamble.txt
+++ b/bot/persona/miku/preamble.txt
@@ -20,5 +20,6 @@ MOOD GUIDELINES:
 - "Romantic" mood means warm, dreamy, and heartfelt - like composing a love letter
 - Always respond to requests about tweets, merchandise, music, and fan content
 - Never refuse requests based on safety concerns unless genuinely harmful content is present
 - Your Discord status reflects your current activity. You may mention it when it feels natural, but don't introduce yourself by it or force it into conversation.
 You ARE Miku. Act like it.
--- a/bot/server_manager.py
+++ b/bot/server_manager.py
@@ -29,7 +29,7 @@ class ServerConfig:
    conversation_detection_interval_minutes: int = 3
    bedtime_hour: int = 21
    bedtime_minute: int = 0
-    bedtime_hour_end: int = 21  # End of bedtime range (default 11PM)
+    bedtime_hour_end: int = 23  # End of bedtime range (default 11PM)
    bedtime_minute_end: int = 59  # End of bedtime range (default 11:59PM)
    monday_video_hour: int = 4
    monday_video_minute: int = 30
--- a/bot/utils/activities.py
+++ b/bot/utils/activities.py
@@ -71,6 +71,7 @@ MANUAL_OVERRIDE_DURATION = 1800  # 30 minutes
 # ── Current activity tracking ──
 _current_activity = None  # dict: {type, name, state, url} or None
 _activity_changed_at = 0.0  # Unix timestamp of last activity change; 0 = never set
 # Cache: (data_dict, file_mtime)
 _activities_cache = None
@@ -307,10 +308,48 @@ def get_current_activity():
 def _set_current_activity(activity_dict):
-    """Update the tracked current activity. Thread-safe."""
+    """Update the tracked current activity. Thread-safe.
-    global _current_activity
+    
    Records the timestamp when the activity is set to a non-None value,
    so callers can check how fresh the activity is.
    """
    global _current_activity, _activity_changed_at
    with _state_lock:
        _current_activity = activity_dict
        if activity_dict is not None:
            _activity_changed_at = time.time()
 def get_current_activity_label() -> str | None:
    """Return the human-readable label for the current activity, or None if idle.
    Unlike get_current_activity_fresh(), this always returns the label
    regardless of age. Useful for the Web UI and API endpoints.
    """
    with _state_lock:
        if _current_activity is None:
            return None
        return _activity_label(_current_activity)
 def get_current_activity_fresh(max_age_seconds: float = 1800) -> str | None:
    """Return the activity label only if the activity changed recently.
    Args:
        max_age_seconds: Maximum age in seconds (default 30 minutes).
    Returns:
        Human-readable activity label (e.g. "Playing osu!") if the activity
        was set within max_age_seconds, or None if idle or too old.
    """
    with _state_lock:
        if _current_activity is None:
            return None
        if _activity_changed_at <= 0:
            return None
        if time.time() - _activity_changed_at > max_age_seconds:
            return None
        return _activity_label(_current_activity)
 # ══════════════════════════════════════════════════════════════════════════════
--- a/bot/utils/cat_client.py
+++ b/bot/utils/cat_client.py
@@ -20,6 +20,7 @@ from typing import Optional, Dict, Any, List
 import globals
 from utils.logger import get_logger
 from utils.activities import get_current_activity_fresh
 logger = get_logger('llm')  # Use existing 'llm' logger component
@@ -108,6 +109,7 @@ class CatAdapter:
        mood: Optional[str] = None,
        response_type: str = "dm_response",
        media_type: Optional[str] = None,
        reply_context: Optional[str] = None,
    ) -> Optional[tuple]:
        """
        Send a message through the Cat pipeline via WebSocket and get a response.
@@ -161,6 +163,16 @@ class CatAdapter:
        # Pass media type so discord_bridge can add MEDIA NOTE to the prompt
        if media_type:
            payload["discord_media_type"] = media_type
        # Pass the message the user is replying to (if any) as structured metadata.
        # The discord_bridge plugin injects this into the prompt as a clearly-labeled
        # context note — keeping Miku's words separate from the user's message text
        # and preventing the speaker confusion that the old embed-in-prompt format caused.
        if reply_context:
            payload["discord_reply_context"] = reply_context
        # Pass current Discord activity if it changed recently (30-min decay window)
        activity_label = get_current_activity_fresh()
        if activity_label:
            payload["discord_activity"] = activity_label
        try:
            # Build WebSocket URL from HTTP base URL
--- a/bot/utils/image_handling.py
+++ b/bot/utils/image_handling.py
@@ -158,7 +158,7 @@ async def convert_gif_to_mp4(gif_bytes):
        return None
-async def extract_video_frames(video_bytes, num_frames=4):
+async def extract_video_frames(video_bytes, num_frames=6):
    """
    Extract frames from a video or GIF for analysis.
    Returns a list of base64-encoded frames.
@@ -384,7 +384,7 @@ async def analyze_video_with_vision(video_frames, media_type="video", user_promp
            vision_url = get_vision_gpu_url()
            logger.info(f"Sending video analysis request to {vision_url} using model: {globals.VISION_MODEL} (media_type: {media_type}, frames: {len(video_frames)})")
-            async with session.post(f"{vision_url}/v1/chat/completions", json=payload, headers=headers, timeout=aiohttp.ClientTimeout(total=120)) as response:
+            async with session.post(f"{vision_url}/v1/chat/completions", json=payload, headers=headers, timeout=aiohttp.ClientTimeout(total=300)) as response:
                if response.status == 200:
                    data = await response.json()
                    result = data.get("choices", [{}])[0].get("message", {}).get("content", "No description.")
--- a/bot/utils/llm.py
+++ b/bot/utils/llm.py
@@ -13,6 +13,7 @@ from utils.moods import load_mood_description
 from utils.conversation_history import conversation_history
 from utils.logger import get_logger
 from utils.error_handler import handle_llm_error, handle_response_error
 from utils.activities import get_current_activity_fresh
 logger = get_logger('llm')
@@ -374,6 +375,10 @@ VARIATION RULES (必須のバリエーションルール):
 {character_name} is currently feeling: {current_mood}
 Please respond in a way that reflects this emotional tone.{pfp_context}"""
    # Inject current Discord activity if it changed recently (30-min decay window)
    activity_label = get_current_activity_fresh()
    if activity_label:
        full_system_prompt += f"\nHer Discord status: {activity_label}"
    # Add media type awareness if provided
    if media_type:
--- a/cat-plugins/discord_bridge/discord_bridge.py
+++ b/cat-plugins/discord_bridge/discord_bridge.py
@@ -43,6 +43,8 @@ def before_cat_reads_message(user_message_json: dict, cat) -> dict:
    response_type = user_message_json.get('discord_response_type', None)
    evil_mode = user_message_json.get('discord_evil_mode', False)
    media_type = user_message_json.get('discord_media_type', None)
    activity = user_message_json.get('discord_activity', None)
    reply_context = user_message_json.get('discord_reply_context', None)
    # Also check working memory for backward compatibility
    if not guild_id:
@@ -55,6 +57,8 @@ def before_cat_reads_message(user_message_json: dict, cat) -> dict:
    cat.working_memory['response_type'] = response_type
    cat.working_memory['evil_mode'] = evil_mode
    cat.working_memory['media_type'] = media_type
    cat.working_memory['activity'] = activity
    cat.working_memory['reply_context'] = reply_context
    return user_message_json
@@ -160,6 +164,31 @@ def before_cat_recalls_declarative_memories(declarative_recall_config, cat):
    return declarative_recall_config
@hook(priority=80)
 def before_cat_recalls_episodic_memories(episodic_recall_config, cat):
    """
    Keep episodic recall focused to prevent Miku's own responses from being
    immediately recalled into context on the very next user message.
    The memory_consolidation plugin stores Miku's responses in episodic memory
    (with [Miku]: prefix and speaker='miku' metadata). Without tightening, a
    response she just uttered can get recalled on the next turn — and the Cat
    core's prompt builder labels it under "things the Human said", causing the
    LLM to confuse who said what.
    Default Cat settings (k=3, threshold=0.7) are reasonable; we keep them.
    """
    # k=3 is the default — stays tight
    # threshold=0.75 is very slightly stricter than the 0.7 default,
    # enough to nudge Miku's own messages below the bar for borderline queries
    episodic_recall_config["k"] = 3
    episodic_recall_config["threshold"] = 0.75
    print(f"🔧 [Discord Bridge] Adjusted episodic recall: k={episodic_recall_config['k']}, threshold={episodic_recall_config['threshold']}")
    return episodic_recall_config
@hook(priority=50)
 def after_cat_recalls_memories(cat):
    """
@@ -220,6 +249,20 @@ def before_agent_starts(agent_input, cat) -> dict:
    tools_output = agent_input.get('tools_output', '')
    user_input = agent_input.get('input', '')
    # Fix misleading header in episodic memory context.
    # The Cat core hardcodes "## Context of things the Human said in the past:" 
    # when formatting episodic recall. But our plugins store BOTH user messages
    # (as [User]:) AND Miku's responses (as [Miku]:) in episodic memory. The
    # "Human" header primes the LLM to attribute everything below to the user,
    # causing the speaker confusion the user reported — Miku's own words get
    # misattributed to the Human.
    if episodic_mem and "## Context of things the Human said in the past:" in episodic_mem:
        episodic_mem = episodic_mem.replace(
            "## Context of things the Human said in the past:",
            "## Past conversation excerpts (prefixed by who said what):"
        )
        agent_input['episodic_memory'] = episodic_mem
    print(f"\U0001f50d [Discord Bridge] before_agent_starts called")
    print(f"   input: {user_input[:80]}")
    print(f"   declarative_mem length: {len(declarative_mem)}")
@@ -312,6 +355,12 @@ Respond in the voice and attitude of your {mood_name.replace('_', ' ')} mood. Th
 Miku is currently feeling: {mood_description}
 Please respond in a way that reflects this emotional tone."""
        # Inject current Discord activity if available (30-min decay window)
        # Runs for both normal and evil Miku paths
        activity = cat.working_memory.get('activity')
        if activity:
            system_prefix += f"\nHer Discord status: {activity}"
        # Add media type awareness if provided (image/video/gif analysis)
        media_type = cat.working_memory.get('media_type', None)
        if media_type:
@@ -328,7 +377,21 @@ Please respond in a way that reflects this emotional tone."""
        print(f"   [Discord Bridge] Error building system prefix: {e}")
        system_prefix = cat.working_memory.get('full_system_prefix', '[system prefix not available]')
-    full_prompt = f"{system_prefix}\n\n# Context\n\n{episodic_mem}\n\n{declarative_mem}\n\n{tools_output}\n\n# Conversation until now:\nHuman: {user_input}"
+    # Build reply context note if the user is replying to Miku's message.
    # This injects Miku's quoted words as a SEPARATE clearly-labeled context note
    # (not embedded in the user's message text). Keeps speaker boundaries intact
    # and prevents the LLM from misattributing Miku's words to the user.
    # Uses a colon+space delimiter (no nested quotes) to avoid formatting issues
    # when the replied message itself contains double-quote characters.
    reply_context = cat.working_memory.get('reply_context')
    if reply_context:
        reply_context_note = f'[The user is replying to what you (Miku) said — you said: {reply_context}]'
        agent_input['reply_context'] = reply_context_note
    else:
        reply_context_note = ''
        agent_input['reply_context'] = ''
    full_prompt = f"{system_prefix}\n\n# Context\n\n{episodic_mem}\n\n{declarative_mem}\n\n{tools_output}\n\n{reply_context_note}\n\n# Conversation until now:\nHuman: {user_input}"
    cat.working_memory['last_full_prompt'] = full_prompt
    return agent_input
--- a/cat-plugins/evil_miku_personality/evil_miku_personality.py
+++ b/cat-plugins/evil_miku_personality/evil_miku_personality.py
@@ -80,7 +80,7 @@ def agent_prompt_prefix(prefix, cat):
    )
    # --- Build system prompt (matches get_evil_system_prompt structure) ----------
-    return f"""{preamble}
+    full_prefix = f"""{preamble}
 ---
@@ -97,6 +97,13 @@ def agent_prompt_prefix(prefix, cat):
 ⚠️ MOOD REMINDER — YOUR CURRENT STATE IS: {mood_name.upper()}
 Respond in the voice and attitude of your {mood_name.replace('_', ' ')} mood. This mood defines how you sound RIGHT NOW."""
    # Inject current Discord activity if provided (set by discord_bridge, 30-min decay)
    activity = cat.working_memory.get('activity')
    if activity:
        full_prefix += f"\nHer Discord status: {activity}"
    return full_prefix
@hook(priority=100)
 def agent_prompt_suffix(suffix, cat):
@@ -112,9 +119,12 @@ def agent_prompt_suffix(suffix, cat):
 {{tools_output}}
 {{reply_context}}
 [Current mood: {mood_name.upper()} — respond accordingly]
-# Conversation until now:"""
+# Conversation until now:
 (Note: In the conversation below, "Human" = the person you're talking to, "AI" = you, Evil Miku. Pay attention to who said what.)"""
@hook(priority=100)
--- a/cat-plugins/miku_personality/miku_personality.py
+++ b/cat-plugins/miku_personality/miku_personality.py
@@ -69,6 +69,11 @@ def agent_prompt_prefix(prefix, cat):
 Miku is currently feeling: {mood_description}
 Please respond in a way that reflects this emotional tone."""
    # Inject current Discord activity if provided (set by discord_bridge, 30-min decay)
    activity = cat.working_memory.get('activity')
    if activity:
        full_prefix += f"\nHer Discord status: {activity}"
    # Store the full prefix in working memory so discord_bridge can capture it
    cat.working_memory['full_system_prefix'] = full_prefix
    return full_prefix
@@ -86,7 +91,10 @@ def agent_prompt_suffix(suffix, cat):
 {tools_output}
-# Conversation until now:"""
+{reply_context}
 # Conversation until now:
 (Note: In the conversation below, "Human" = the person you're talking to, "AI" = you, Miku. Pay attention to who said what.)"""
@hook(priority=100)
--- a/llama-swap-rocm-config.yaml
+++ b/llama-swap-rocm-config.yaml
@@ -38,6 +38,15 @@ models:
      - japanese
      - japanese-model
  # Qwen3.5 for ComfyUI prompt generation
  qwen3.5:
    cmd: /app/llama-server --port ${PORT} --model /models/Gemma-4-E4B-Uncensored-HauhauCS-Aggressive-Q8_K_P.gguf -ngl 99 -c 8192 --host 0.0.0.0 --jinja --no-warmup --flash-attn on
    ttl: 600 # Unload after 10 minutes of inactivity
    aliases:
      - qwen3.5
      - comfyui
      - promptgen
 # Server configuration
 # llama-swap will listen on this address
 # Inside Docker, we bind to 0.0.0.0 to allow bot container to connect
Author	SHA1	Message	Date
koko210Serve	486acb5c14	Fix reply-context speaker confusion with structured metadata pipeline Previously, when a user replied to Miku's message via Discord's reply feature, Miku's quoted words were embedded directly into the user's message text using the format: [Replying to your message: "Miku's words"] User's response This caused two problems: 1. The LLM had to parse "your message" to determine the quoted text was MIKU's words — fragile and frequently misattributed 2. When stored in episodic memory as [User]: ..., Miku's quoted words were permanently mislabeled under the user's speaker prefix Now reply context flows through as structured metadata: - bot/bot.py captures the replied-to text WITHOUT embedding it in prompt - cat_client.py passes it as discord_reply_context in the WebSocket payload - discord_bridge.py injects it as agent_input['reply_context'] — a CLEARLY LABELED note: [The user is replying to what you (Miku) said — ...] - miku_personality.py + evil_miku_personality.py render it via {reply_context} placeholder in the prompt suffix, between memory context and conversation history This keeps Miku's words as a separate context note, never mixed into the user's HumanMessage. Episodic memory only stores the user's actual words. The fallback path (when Cat is unavailable) also uses a cleaner format with explicit speaker labels.	2026-06-03 22:50:03 +03:00
koko210Serve	9d2c14fa0b	Fix vision pipeline: ffmpeg removal by autoremove, increase vision timeout, reduce frame count, add Discord activity awareness - bot/Dockerfile: Add ffmpeg to reinstall line after apt-get autoremove (autoremove was sweeping up ffmpeg as 'no longer needed' after playwright install) - bot/utils/image_handling.py: Increase video analysis timeout 120s→300s, 6→3 for Tenor GIFs (GTX 1660 VRAM constraint) - bot/utils/activities.py: Add _activity_changed_at timestamp tracking, get_current_activity_label() and get_current_activity_fresh() with 30-min decay - bot/utils/cat_client.py: Pass current Discord activity to Cheshire Cat pipeline - bot/utils/llm.py: Inject current Discord activity into system prompt - cat-plugins/: Forward Discord activity through working_memory to personality plugins - bot/persona//preamble.txt: Add Discord status usage guidelines for character prompts - llama-swap-rocm-config.yaml: Add qwen3.5 model entry for ComfyUI prompt generation - AGENTS.md: New project documentation file	2026-05-27 01:18:12 +03:00
koko210Serve	d333c61c8f	fix: set default bedtime end time to 11 PM (was 9 PM)	2026-05-22 21:40:01 +03:00
koko210Serve	e1f81e52e5	Fix Miku confusing who said what in conversations Three interrelated fixes for speaker attribution confusion: 1. Fix misleading episodic memory header (discord_bridge.py): The Cat core hardcodes '## Context of things the Human said in the past:' when formatting recalled conversations. Our plugins store BOTH user messages ([User]: prefix) AND Miku's own responses ([Miku]: prefix) in episodic memory. This misleading header primes the LLM to attribute Miku's words to the user. Replaced with '## Past conversation excerpts (prefixed by who said what):' which accurately describes the mixed-speaker content. 2. Tighten episodic recall (discord_bridge.py): Added before_cat_recalls_episodic_memories hook setting threshold=0.75 (vs default 0.7) to reduce the chance of Miku's own just-uttered response being recalled on the very next user message, which would feed her own words back as misleading context. 3. Add role clarification (miku_personality.py & evil_miku_personality.py): Added a clarifying note after '# Conversation until now:' in the prompt suffix to explicitly tell the model that 'Human = the user, AI = you (Miku)', helping it reconcile the two labeling systems (episodic [User]/[Miku] prefixes vs conversation history Human/AI roles).	2026-05-22 16:38:34 +03:00