Previously, when a user replied to Miku's message via Discord's reply
feature, Miku's quoted words were embedded directly into the user's
message text using the format:
[Replying to your message: "Miku's words"] User's response
This caused two problems:
1. The LLM had to parse "your message" to determine the quoted text
was MIKU's words — fragile and frequently misattributed
2. When stored in episodic memory as [User]: ..., Miku's quoted words
were permanently mislabeled under the user's speaker prefix
Now reply context flows through as structured metadata:
- bot/bot.py captures the replied-to text WITHOUT embedding it in prompt
- cat_client.py passes it as discord_reply_context in the WebSocket payload
- discord_bridge.py injects it as agent_input['reply_context'] — a
CLEARLY LABELED note: [The user is replying to what you (Miku) said — ...]
- miku_personality.py + evil_miku_personality.py render it via
{reply_context} placeholder in the prompt suffix, between memory
context and conversation history
This keeps Miku's words as a separate context note, never mixed into
the user's HumanMessage. Episodic memory only stores the user's actual
words. The fallback path (when Cat is unavailable) also uses a cleaner
format with explicit speaker labels.
- bot/Dockerfile: Add ffmpeg to reinstall line after apt-get autoremove
(autoremove was sweeping up ffmpeg as 'no longer needed' after playwright install)
- bot/utils/image_handling.py: Increase video analysis timeout 120s→300s, 6→3 for Tenor GIFs (GTX 1660 VRAM constraint)
- bot/utils/activities.py: Add _activity_changed_at timestamp tracking,
get_current_activity_label() and get_current_activity_fresh() with 30-min decay
- bot/utils/cat_client.py: Pass current Discord activity to Cheshire Cat pipeline
- bot/utils/llm.py: Inject current Discord activity into system prompt
- cat-plugins/*: Forward Discord activity through working_memory to personality plugins
- bot/persona/*/preamble.txt: Add Discord status usage guidelines for character prompts
- llama-swap-rocm-config.yaml: Add qwen3.5 model entry for ComfyUI prompt generation
- AGENTS.md: New project documentation file
- switch_to_evil_personality now reads EVIL_TEXT_MODEL from globals
- switch_to_normal_personality now reads TEXT_MODEL from globals
- Removes desync risk when user changes models via Web UI
Cat's WebSocket handler returns HTTP 500 for user IDs without the
'discord_' prefix. The consolidation WS was using 'system_consolidation'
which failed immediately with WSServerHandshakeError (500). Changed to
'discord_consolidation' which connects successfully.
Three fixes for consolidation reliability:
1. Fire-and-forget API: POST /memory/consolidate now launches consolidation
as an asyncio background task and returns immediately. The old approach
blocked until Cat's WS response, which could take 5+ minutes (LLM
extraction calls), exceeding both the WS timeout and browser fetch
timeout. Web UI now polls /memory/status to track completion.
2. Increased timeout: cat_client.trigger_consolidation() timeout raised
from 300s to 600s (configurable via parameter). Logs unexpected WS
message types for debugging.
3. Better logging: Consolidation log messages prefixed with 🌙 for
grep-friendliness. cat_client errors include exc_info=True for
traceback visibility. Web UI shows elapsed time while polling.
- Fix silent None return in analyze_image_with_vision exception handler
- Add None/empty guards after vision analysis in bot.py (image, video, GIF, Tenor)
- Route all image/video/GIF responses through Cheshire Cat pipeline (was
calling query_llama directly), enabling episodic memory storage for media
interactions and correct Last Prompt display in Web UI
- Add media_type parameter to cat_adapter.query() and forward as
discord_media_type in WebSocket payload
- Update discord_bridge plugin to read media_type from payload and inject
MEDIA NOTE into system prefix in before_agent_starts hook
- Add _extract_vision_question() helper to strip Discord mentions and bot-name
triggers from user message; pass cleaned question to vision model so specific
questions (e.g. 'what is the person wearing?') go directly to the vision model
instead of the generic 'Describe this image in detail.' fallback
- Pass user_prompt to all analyze_image_with_qwen / analyze_video_with_vision
call sites in bot.py (image, video, GIF, Tenor, embed paths)
- Fix autonomous reaction loops skipping messages that @mention the bot or have
media attachments in DMs, preventing duplicate vision model calls for images
already being processed by the main message handler
- Increase vision max_tokens: images 300->800, video/GIF 400->1000 (no VRAM
impact; KV cache is pre-allocated at model load time)
- discord_bridge before_agent_starts now checks evil_mode from
working_memory to load the correct personality files:
Normal: miku_lore/prompt/lyrics + /app/moods/{mood}.txt
Evil: evil_miku_lore/prompt/lyrics + /app/moods/evil/{mood}.txt
- Reads files directly instead of relying on cross-plugin working_memory
- cat_client.query() returns (response, full_prompt) tuple
- Full prompt includes system prefix + recalled memories + conversation
- API /prompt/cat returns full_prompt field
Bot was calling restore_evil_cat_state() in on_ready() before Cheshire
Cat finished booting (~25s), causing all plugin toggle API calls to fail
silently. Evil Miku plugin was left disabled and the bot used Cat's
default personality instead.
Changes:
- cat_client.py: add wait_for_ready() that polls Cat health endpoint
every 5s for up to 120s before attempting any admin API calls
- evil_mode.py: rewrite restore_evil_cat_state() with:
- wait_for_ready() gate before any plugin/model switching
- 3-second extra delay after Cat is up (plugin registry fully loaded)
- up to 3 retries on failure
- post-switch verification that the correct plugins are actually active
Also fixes helcyon model references that leaked into the container image
(cat_client.py was switching Cat's LLM to 'helcyon' which has no
llama-swap handler; reverted to correct 'darkidol' / 'llama3.1').
MOOD SYSTEM FIX:
- Mount bot/moods directory in docker-compose.yml for Cat container access
- Update miku_personality plugin to load mood descriptions from .txt files
- Add Cat logger for debugging mood loading (replaces print statements)
- Moods now dynamically loaded from working_memory instead of hardcoded neutral
**Critical Bug Fixes:**
1. Per-user memory isolation bug
- Changed CatAdapter from HTTP POST to WebSocket /ws/{user_id}
- User_id now comes from URL path parameter (true per-user isolation)
- Verified: Different users can't see each other's memories
2. Memory API 405 errors
- Replaced non-existent Cat endpoint calls with Qdrant direct queries
- get_memory_points(): Now uses POST /collections/{collection}/points/scroll
- delete_memory_point(): Now uses POST /collections/{collection}/points/delete
3. Memory stats showing null counts
- Reimplemented get_memory_stats() to query Qdrant directly
- Now returns accurate counts: episodic: 20, declarative: 6, procedural: 4
4. Miku couldn't see usernames
- Modified discord_bridge before_cat_reads_message hook
- Prepends [Username says:] to every message text
- LLM now knows who is texting: [Alice says:] Hello Miku!
5. Web UI Memory tab layout
- Tab9 was positioned outside .tab-container div (showed to the right)
- Moved tab9 HTML inside container, before closing divs
- Memory tab now displays below tab buttons like other tabs
**Code Changes:**
bot/utils/cat_client.py:
- Line 25: Logger name changed to 'llm' (available component)
- get_memory_stats() (lines 256-285): Query Qdrant directly via HTTP GET
- get_memory_points() (lines 275-310): Use Qdrant POST /points/scroll
- delete_memory_point() (lines 350-370): Use Qdrant POST /points/delete
cat-plugins/discord_bridge/discord_bridge.py:
- Fixed .pop() → .get() (UserMessage is Pydantic BaseModelDict)
- Added before_cat_reads_message logic to prepend [Username says:]
- Message format: [Alice says:] message content
Dockerfile.llamaswap-rocm:
- Lines 37-44: Added conditional check for UI directory
- if [ -d ui ] before npm install && npm run build
- Fixes build failure when llama-swap UI dir doesn't exist
bot/static/index.html:
- Moved tab9 from lines 1554-1688 (outside container)
- To position before container closing divs (now inside)
- Memory tab button at line 673: 🧠 Memories
**Testing & Verification:**
✅ Per-user isolation verified (Docker exec test)
✅ Memory stats showing real counts (curl test)
✅ Memory API working (facts/episodic loading)
✅ Web UI layout fixed (tab displays correctly)
✅ All 5 services running (llama-swap, llama-swap-amd, qdrant, cat, bot)
✅ Username prepending working (message context for LLM)
**Result:** All Phase 3 critical bugs fixed and verified working.
Key changes:
- CatAdapter (bot/utils/cat_client.py): WebSocket /ws/{user_id} for chat
queries instead of HTTP POST (fixes per-user memory isolation when no
API keys are configured — HTTP defaults all users to user_id='user')
- Memory management API: 8 endpoints for status, stats, facts, episodic
memories, consolidation trigger, multi-step delete with confirmation
- Web UI: Memory tab (tab9) with collection stats, fact/episodic browser,
manual consolidation trigger, and 3-step delete flow requiring exact
confirmation string
- Bot integration: Cat-first response path with query_llama fallback for
both text and embed responses, server mood detection
- Discord bridge plugin: fixed .pop() to .get() (UserMessage is a Pydantic
BaseModelDict, not a raw dict), metadata extraction via extra attributes
- Unified docker-compose: Cat + Qdrant services merged into main compose,
bot depends_on Cat healthcheck
- All plugins (discord_bridge, memory_consolidation, miku_personality)
consolidated into cat-plugins/ for volume mount
- query_llama deprecated but functional for compatibility