Previously, when a user replied to Miku's message via Discord's reply
feature, Miku's quoted words were embedded directly into the user's
message text using the format:
[Replying to your message: "Miku's words"] User's response
This caused two problems:
1. The LLM had to parse "your message" to determine the quoted text
was MIKU's words — fragile and frequently misattributed
2. When stored in episodic memory as [User]: ..., Miku's quoted words
were permanently mislabeled under the user's speaker prefix
Now reply context flows through as structured metadata:
- bot/bot.py captures the replied-to text WITHOUT embedding it in prompt
- cat_client.py passes it as discord_reply_context in the WebSocket payload
- discord_bridge.py injects it as agent_input['reply_context'] — a
CLEARLY LABELED note: [The user is replying to what you (Miku) said — ...]
- miku_personality.py + evil_miku_personality.py render it via
{reply_context} placeholder in the prompt suffix, between memory
context and conversation history
This keeps Miku's words as a separate context note, never mixed into
the user's HumanMessage. Episodic memory only stores the user's actual
words. The fallback path (when Cat is unavailable) also uses a cleaner
format with explicit speaker labels.
- bot/Dockerfile: Add ffmpeg to reinstall line after apt-get autoremove
(autoremove was sweeping up ffmpeg as 'no longer needed' after playwright install)
- bot/utils/image_handling.py: Increase video analysis timeout 120s→300s, 6→3 for Tenor GIFs (GTX 1660 VRAM constraint)
- bot/utils/activities.py: Add _activity_changed_at timestamp tracking,
get_current_activity_label() and get_current_activity_fresh() with 30-min decay
- bot/utils/cat_client.py: Pass current Discord activity to Cheshire Cat pipeline
- bot/utils/llm.py: Inject current Discord activity into system prompt
- cat-plugins/*: Forward Discord activity through working_memory to personality plugins
- bot/persona/*/preamble.txt: Add Discord status usage guidelines for character prompts
- llama-swap-rocm-config.yaml: Add qwen3.5 model entry for ComfyUI prompt generation
- AGENTS.md: New project documentation file
Three interrelated fixes for speaker attribution confusion:
1. Fix misleading episodic memory header (discord_bridge.py):
The Cat core hardcodes '## Context of things the Human said in the past:'
when formatting recalled conversations. Our plugins store BOTH user messages
([User]: prefix) AND Miku's own responses ([Miku]: prefix) in episodic memory.
This misleading header primes the LLM to attribute Miku's words to the user.
Replaced with '## Past conversation excerpts (prefixed by who said what):'
which accurately describes the mixed-speaker content.
2. Tighten episodic recall (discord_bridge.py):
Added before_cat_recalls_episodic_memories hook setting threshold=0.75
(vs default 0.7) to reduce the chance of Miku's own just-uttered response
being recalled on the very next user message, which would feed her own
words back as misleading context.
3. Add role clarification (miku_personality.py & evil_miku_personality.py):
Added a clarifying note after '# Conversation until now:' in the prompt
suffix to explicitly tell the model that 'Human = the user, AI = you (Miku)',
helping it reconcile the two labeling systems (episodic [User]/[Miku] prefixes
vs conversation history Human/AI roles).
Preamble:
- Sentence limit 1-3 → 2-4 (revert to original 'sting, then land' range)
- Remove 'if you can say it in one, say it in one' (encouraged lazy dismissals)
- Add engagement rule: 'Always engage with what was said — acknowledge the
question or statement, then twist the knife. Ignoring isn\'t sharp, it\'s lazy.'
Suffix:
- Remove '[Keep responses short and cutting — 1-3 sentences. No monologues.]'
The suffix was the LAST thing the model processed, so its brevity hammer
overpowered the preamble's engagement instruction. Preamble alone is enough.
Five targeted fixes:
1. discord_bridge (priority 100): Skip 'cheerful virtual idol' wrapper and
'CRITICAL INSTRUCTION' about facts when evil_mode is active. Evil Miku
gets her own prompt from evil_miku_personality plugin.
2. memory_consolidation (priority 10): Soften fact-usage pressure:
'Use THESE facts when answering' → 'You may reference these facts if
relevant to the conversation'. Also soften username command tone.
3. evil_miku_personality (priority 100→101): Bump above discord_bridge
so Evil Miku's prefix replacement deterministically discards any
Miku-mode wrappers regardless of plugin load order.
4. evil preamble: Restructure for brevity — add 'Be SHORT and SHARP'
declaration, move RESPONSE RULES before mood, tighten sentence limit
from 2-4 to 1-3 with 'if you can say it in one, say it in one.'
5. evil suffix: Add final brevity reminder '[Keep responses short and
cutting — 1-3 sentences. No monologues.]' right before conversation
for maximum recency influence.
Two bugs were causing Miku to call users by wrong names:
BUG 1 - No authoritative source:
Declarative name facts ('The user's name is Lily') were injected into
the prompt without any counterweight. If an old consolidation run
extracted a wrong name, Miku would believe it forever.
Fix: agent_prompt_prefix now appends the user's Discord display name
as AUTHORITATIVE context, with explicit instruction to prefer it over
any contradictory name facts.
BUG 2 - Dedup prevented name updates:
_is_duplicate_fact() used vector similarity to detect duplicates.
'The user's name is Lily' and 'The user's name is koko210Serve' are
~80% identical text, giving high cosine similarity (>0.85 threshold).
New correct name facts were silently rejected as 'duplicates'.
Fix: name facts now use _find_existing_fact() to compare fact_value
directly. If the name changed, old fact is deleted and new one stored.
Also: the extraction prompt now includes the user's Discord display
name as a hint, so the LLM knows the authoritative name when extracting
facts during consolidation.
Step 4 of memory system overhaul: single source of truth for prompts.
Problem: The system prompt was defined inline in 4 different places:
miku_personality.py, evil_miku_personality.py, llm.py, discord_bridge.py.
These could drift out of sync — and the discord_bridge WebUI
reconstruction was already missing CRITICAL RULES, CHARACTER CONTEXT,
MOOD GUIDELINES, and RESPONSE RULES sections.
Fix:
- Create persona/miku/preamble.txt — canonical normal Miku preamble
- Create persona/evil/preamble.txt — canonical evil Miku preamble
(with {mood_name} and {mood_description} format placeholders)
- All 5 consumers now read from these files:
* miku_personality.py (Cat plugin, primary path)
* evil_miku_personality.py (Cat plugin, primary path)
* discord_bridge.py (WebUI 'Last Prompt' reconstruction)
* llm.py (fallback path, normal Miku)
* evil_mode.py get_evil_system_prompt() (fallback path, evil Miku)
- All consumers include graceful fallbacks if preamble files are missing
- Fixed evil_mode.py discrepancy: 'body and size' now matches canonical
The preamble files are Docker volume-mounted into both containers:
bot/persona/ → /app/persona/ (bot, via Dockerfile COPY)
bot/persona/ → /app/cat/data/ (Cat, via docker-compose volume mount)
Editing the preamble file on the host immediately updates the Cat path
(bot path requires rebuild due to COPY).
Prevents Miku from confusing her own words with what users said.
User messages stored by discord_bridge now get a '[User]: ' prefix on
page_content, mirroring the existing '[Miku]: ' prefix on Miku's own
responses. When episodic memories are recalled via RAG and injected
into the prompt, the LLM can now clearly distinguish:
[User]: I like pizza
[Miku]: That's great! What toppings do you like?
Without this, raw user text looked identical to Miku's text in the
recalled memory context, causing potential confusion about who said what.
The consolidation classifier strips the [User]: prefix before analyzing
content, so word counts and pattern matching remain accurate.
Step 3 of memory system overhaul: smart junk detection.
Replaces the old 37-pattern frozenset (44% accuracy) with a 3-tier hybrid:
TIER 1 - DEFINITELY_TRIVIAL (instant delete, no LLM):
50+ exact-match patterns, pure emoji, single char, punctuation-only
TIER 2 - DEFINITELY_IMPORTANT (instant keep, no LLM):
8+ words, question with substance, first-person statements,
numbers/dates, links, mentions
TIER 3 - BORDERLINE (batch → LLM for economical classification):
2-7 word messages without clear markers
Compact prompt: ~150-200 tokens per 20-message batch
Safety default: KEEP on any parsing error
Real-time filtering (discord_bridge) uses conservative heuristics only:
- 1-char, pure reactions, single emoji, custom emoji-only
- 50+ single-word fillers
- Never deletes multi-word messages in real-time
- Philosophy: false negatives (junk stored) > false positives (data lost)
Consolidation gets the full hybrid pipeline with LLM for borderline
cases, achieving much better accuracy than the old 44% while keeping
token costs minimal (LLM only called during nightly consolidation,
not real-time chat).
Step 1 of memory system overhaul: persona tagging.
- discord_bridge: tag user messages with 'persona' metadata at storage time
- memory_consolidation: tag Miku's own responses with 'persona' metadata
- memory_consolidation: tag declarative facts with source persona during extraction
- memory_consolidation: pass persona context to LLM extraction prompt
- memory_consolidation: annotate cross-persona facts in prompt injection
(e.g., '(learned as Evil Miku)' when Evil facts appear for Normal Miku)
- Web UI: show persona badge (🎤 Miku / 😈 Evil Miku) on facts and episodic
memories in the Memory Management tab
This lets both personas know which version of Miku each memory came from,
enabling Evil Miku to distinguish her own memories from Normal Miku's.
- Fix silent None return in analyze_image_with_vision exception handler
- Add None/empty guards after vision analysis in bot.py (image, video, GIF, Tenor)
- Route all image/video/GIF responses through Cheshire Cat pipeline (was
calling query_llama directly), enabling episodic memory storage for media
interactions and correct Last Prompt display in Web UI
- Add media_type parameter to cat_adapter.query() and forward as
discord_media_type in WebSocket payload
- Update discord_bridge plugin to read media_type from payload and inject
MEDIA NOTE into system prefix in before_agent_starts hook
- Add _extract_vision_question() helper to strip Discord mentions and bot-name
triggers from user message; pass cleaned question to vision model so specific
questions (e.g. 'what is the person wearing?') go directly to the vision model
instead of the generic 'Describe this image in detail.' fallback
- Pass user_prompt to all analyze_image_with_qwen / analyze_video_with_vision
call sites in bot.py (image, video, GIF, Tenor, embed paths)
- Fix autonomous reaction loops skipping messages that @mention the bot or have
media attachments in DMs, preventing duplicate vision model calls for images
already being processed by the main message handler
- Increase vision max_tokens: images 300->800, video/GIF 400->1000 (no VRAM
impact; KV cache is pre-allocated at model load time)
- discord_bridge before_agent_starts now checks evil_mode from
working_memory to load the correct personality files:
Normal: miku_lore/prompt/lyrics + /app/moods/{mood}.txt
Evil: evil_miku_lore/prompt/lyrics + /app/moods/evil/{mood}.txt
- Reads files directly instead of relying on cross-plugin working_memory
- cat_client.query() returns (response, full_prompt) tuple
- Full prompt includes system prefix + recalled memories + conversation
- API /prompt/cat returns full_prompt field
- Added empty settings.json required by Cat plugin system
- Plugin now appears in ACTIVE PLUGINS list
- Enabled via /plugins/toggle API endpoint
- Ready to inject PFP descriptions when user asks about it
- Create profile_picture_context plugin to detect PFP queries via regex
- Inject current_description.txt only when user asks about profile picture
- Mount bot/memory directory in Cat container for PFP access
- Avoids context bloat by only adding PFP description when relevant
- Patterns match: 'what does your pfp look like', 'describe your avatar', etc.
- Works seamlessly with existing profile picture update system
- No manual sync needed - description auto-updates with PFP changes
MOOD SYSTEM FIX:
- Mount bot/moods directory in docker-compose.yml for Cat container access
- Update miku_personality plugin to load mood descriptions from .txt files
- Add Cat logger for debugging mood loading (replaces print statements)
- Moods now dynamically loaded from working_memory instead of hardcoded neutral
Root cause: The miku_personality plugin's agent_prompt_suffix hook was returning
an empty string, which wiped out the {declarative_memory} and {episodic_memory}
placeholders from the prompt template. This caused the LLM to never receive any
stored facts about users, resulting in hallucinated responses.
Changes:
- miku_personality: Changed agent_prompt_suffix to return the memory context
section with {episodic_memory}, {declarative_memory}, and {tools_output}
placeholders instead of empty string
- discord_bridge: Added before_cat_recalls_declarative_memories hook to increase
k-value from 3 to 10 and lower threshold from 0.7 to 0.5 for better fact
retrieval. Added agent_prompt_prefix to emphasize factual accuracy. Added
debug logging via before_agent_starts hook.
Result: Miku now correctly recalls user facts (favorite songs, games, etc.)
from declarative memory with 100% accuracy.
Tested with:
- 'What is my favorite song?' → Correctly answers 'Monitoring (Best Friend Remix) by DECO*27'
- 'Do you remember my favorite song?' → Correctly recalls the song
- 'What is my favorite video game?' → Correctly answers 'Sonic Adventure'
**Critical Bug Fixes:**
1. Per-user memory isolation bug
- Changed CatAdapter from HTTP POST to WebSocket /ws/{user_id}
- User_id now comes from URL path parameter (true per-user isolation)
- Verified: Different users can't see each other's memories
2. Memory API 405 errors
- Replaced non-existent Cat endpoint calls with Qdrant direct queries
- get_memory_points(): Now uses POST /collections/{collection}/points/scroll
- delete_memory_point(): Now uses POST /collections/{collection}/points/delete
3. Memory stats showing null counts
- Reimplemented get_memory_stats() to query Qdrant directly
- Now returns accurate counts: episodic: 20, declarative: 6, procedural: 4
4. Miku couldn't see usernames
- Modified discord_bridge before_cat_reads_message hook
- Prepends [Username says:] to every message text
- LLM now knows who is texting: [Alice says:] Hello Miku!
5. Web UI Memory tab layout
- Tab9 was positioned outside .tab-container div (showed to the right)
- Moved tab9 HTML inside container, before closing divs
- Memory tab now displays below tab buttons like other tabs
**Code Changes:**
bot/utils/cat_client.py:
- Line 25: Logger name changed to 'llm' (available component)
- get_memory_stats() (lines 256-285): Query Qdrant directly via HTTP GET
- get_memory_points() (lines 275-310): Use Qdrant POST /points/scroll
- delete_memory_point() (lines 350-370): Use Qdrant POST /points/delete
cat-plugins/discord_bridge/discord_bridge.py:
- Fixed .pop() → .get() (UserMessage is Pydantic BaseModelDict)
- Added before_cat_reads_message logic to prepend [Username says:]
- Message format: [Alice says:] message content
Dockerfile.llamaswap-rocm:
- Lines 37-44: Added conditional check for UI directory
- if [ -d ui ] before npm install && npm run build
- Fixes build failure when llama-swap UI dir doesn't exist
bot/static/index.html:
- Moved tab9 from lines 1554-1688 (outside container)
- To position before container closing divs (now inside)
- Memory tab button at line 673: 🧠 Memories
**Testing & Verification:**
✅ Per-user isolation verified (Docker exec test)
✅ Memory stats showing real counts (curl test)
✅ Memory API working (facts/episodic loading)
✅ Web UI layout fixed (tab displays correctly)
✅ All 5 services running (llama-swap, llama-swap-amd, qdrant, cat, bot)
✅ Username prepending working (message context for LLM)
**Result:** All Phase 3 critical bugs fixed and verified working.
Key changes:
- CatAdapter (bot/utils/cat_client.py): WebSocket /ws/{user_id} for chat
queries instead of HTTP POST (fixes per-user memory isolation when no
API keys are configured — HTTP defaults all users to user_id='user')
- Memory management API: 8 endpoints for status, stats, facts, episodic
memories, consolidation trigger, multi-step delete with confirmation
- Web UI: Memory tab (tab9) with collection stats, fact/episodic browser,
manual consolidation trigger, and 3-step delete flow requiring exact
confirmation string
- Bot integration: Cat-first response path with query_llama fallback for
both text and embed responses, server mood detection
- Discord bridge plugin: fixed .pop() to .get() (UserMessage is a Pydantic
BaseModelDict, not a raw dict), metadata extraction via extra attributes
- Unified docker-compose: Cat + Qdrant services merged into main compose,
bot depends_on Cat healthcheck
- All plugins (discord_bridge, memory_consolidation, miku_personality)
consolidated into cat-plugins/ for volume mount
- query_llama deprecated but functional for compatibility