feat(memory): add [User]: prefix to user messages for speaker clarity

Prevents Miku from confusing her own words with what users said. User messages stored by discord_bridge now get a '[User]: ' prefix on page_content, mirroring the existing '[Miku]: ' prefix on Miku's own responses. When episodic memories are recalled via RAG and injected into the prompt, the LLM can now clearly distinguish: [User]: I like pizza [Miku]: That's great! What toppings do you like? Without this, raw user text looked identical to Miku's text in the recalled memory context, causing potential confusion about who said what. The consolidation classifier strips the [User]: prefix before analyzing content, so word counts and pattern matching remain accurate.
2026-05-15 14:13:29 +03:00
parent 5a740c9334
commit e7ec82d154
2 changed files with 14 additions and 3 deletions
--- a/cat-plugins/memory_consolidation/memory_consolidation.py
+++ b/cat-plugins/memory_consolidation/memory_consolidation.py
@@ -77,14 +77,20 @@ def _classify_message_tier(content, metadata):
    # Important: NEVER classifies Miku's own messages — those are always kept.
    """
    text = content.strip()
-    text_lower = text.lower()
-    word_count = len(text_lower.split())
-    msg_len = len(text_lower)
    
    # Miku's own messages are always kept (speaker check)
    if metadata.get('speaker') == 'miku' or text.startswith('[Miku]:'):
        return 'keep'
    
+    # Strip [User]: prefix (added by discord_bridge at storage time) so the
+    # classifier analyzes the actual message content, not the label
+    if text.startswith('[User]:'):
+        text = text[len('[User]:'):].strip()
+    
+    text_lower = text.lower()
+    word_count = len(text_lower.split())
+    msg_len = len(text_lower)
+    
    # --- PASS 1: DEFINITELY TRIVIAL ---
    
    # Empty or single char