Two bugs were causing Miku to call users by wrong names:
BUG 1 - No authoritative source:
Declarative name facts ('The user's name is Lily') were injected into
the prompt without any counterweight. If an old consolidation run
extracted a wrong name, Miku would believe it forever.
Fix: agent_prompt_prefix now appends the user's Discord display name
as AUTHORITATIVE context, with explicit instruction to prefer it over
any contradictory name facts.
BUG 2 - Dedup prevented name updates:
_is_duplicate_fact() used vector similarity to detect duplicates.
'The user's name is Lily' and 'The user's name is koko210Serve' are
~80% identical text, giving high cosine similarity (>0.85 threshold).
New correct name facts were silently rejected as 'duplicates'.
Fix: name facts now use _find_existing_fact() to compare fact_value
directly. If the name changed, old fact is deleted and new one stored.
Also: the extraction prompt now includes the user's Discord display
name as a hint, so the LLM knows the authoritative name when extracting
facts during consolidation.
Prevents Miku from confusing her own words with what users said.
User messages stored by discord_bridge now get a '[User]: ' prefix on
page_content, mirroring the existing '[Miku]: ' prefix on Miku's own
responses. When episodic memories are recalled via RAG and injected
into the prompt, the LLM can now clearly distinguish:
[User]: I like pizza
[Miku]: That's great! What toppings do you like?
Without this, raw user text looked identical to Miku's text in the
recalled memory context, causing potential confusion about who said what.
The consolidation classifier strips the [User]: prefix before analyzing
content, so word counts and pattern matching remain accurate.
Step 3 of memory system overhaul: smart junk detection.
Replaces the old 37-pattern frozenset (44% accuracy) with a 3-tier hybrid:
TIER 1 - DEFINITELY_TRIVIAL (instant delete, no LLM):
50+ exact-match patterns, pure emoji, single char, punctuation-only
TIER 2 - DEFINITELY_IMPORTANT (instant keep, no LLM):
8+ words, question with substance, first-person statements,
numbers/dates, links, mentions
TIER 3 - BORDERLINE (batch → LLM for economical classification):
2-7 word messages without clear markers
Compact prompt: ~150-200 tokens per 20-message batch
Safety default: KEEP on any parsing error
Real-time filtering (discord_bridge) uses conservative heuristics only:
- 1-char, pure reactions, single emoji, custom emoji-only
- 50+ single-word fillers
- Never deletes multi-word messages in real-time
- Philosophy: false negatives (junk stored) > false positives (data lost)
Consolidation gets the full hybrid pipeline with LLM for borderline
cases, achieving much better accuracy than the old 44% while keeping
token costs minimal (LLM only called during nightly consolidation,
not real-time chat).
Step 1 of memory system overhaul: persona tagging.
- discord_bridge: tag user messages with 'persona' metadata at storage time
- memory_consolidation: tag Miku's own responses with 'persona' metadata
- memory_consolidation: tag declarative facts with source persona during extraction
- memory_consolidation: pass persona context to LLM extraction prompt
- memory_consolidation: annotate cross-persona facts in prompt injection
(e.g., '(learned as Evil Miku)' when Evil facts appear for Normal Miku)
- Web UI: show persona badge (🎤 Miku / 😈 Evil Miku) on facts and episodic
memories in the Memory Management tab
This lets both personas know which version of Miku each memory came from,
enabling Evil Miku to distinguish her own memories from Normal Miku's.