Step 3 of memory system overhaul: smart junk detection.
Replaces the old 37-pattern frozenset (44% accuracy) with a 3-tier hybrid:
TIER 1 - DEFINITELY_TRIVIAL (instant delete, no LLM):
50+ exact-match patterns, pure emoji, single char, punctuation-only
TIER 2 - DEFINITELY_IMPORTANT (instant keep, no LLM):
8+ words, question with substance, first-person statements,
numbers/dates, links, mentions
TIER 3 - BORDERLINE (batch → LLM for economical classification):
2-7 word messages without clear markers
Compact prompt: ~150-200 tokens per 20-message batch
Safety default: KEEP on any parsing error
Real-time filtering (discord_bridge) uses conservative heuristics only:
- 1-char, pure reactions, single emoji, custom emoji-only
- 50+ single-word fillers
- Never deletes multi-word messages in real-time
- Philosophy: false negatives (junk stored) > false positives (data lost)
Consolidation gets the full hybrid pipeline with LLM for borderline
cases, achieving much better accuracy than the old 44% while keeping
token costs minimal (LLM only called during nightly consolidation,
not real-time chat).
Step 1 of memory system overhaul: persona tagging.
- discord_bridge: tag user messages with 'persona' metadata at storage time
- memory_consolidation: tag Miku's own responses with 'persona' metadata
- memory_consolidation: tag declarative facts with source persona during extraction
- memory_consolidation: pass persona context to LLM extraction prompt
- memory_consolidation: annotate cross-persona facts in prompt injection
(e.g., '(learned as Evil Miku)' when Evil facts appear for Normal Miku)
- Web UI: show persona badge (🎤 Miku / 😈 Evil Miku) on facts and episodic
memories in the Memory Management tab
This lets both personas know which version of Miku each memory came from,
enabling Evil Miku to distinguish her own memories from Normal Miku's.