Step 3 of memory system overhaul: smart junk detection.
Replaces the old 37-pattern frozenset (44% accuracy) with a 3-tier hybrid:
TIER 1 - DEFINITELY_TRIVIAL (instant delete, no LLM):
50+ exact-match patterns, pure emoji, single char, punctuation-only
TIER 2 - DEFINITELY_IMPORTANT (instant keep, no LLM):
8+ words, question with substance, first-person statements,
numbers/dates, links, mentions
TIER 3 - BORDERLINE (batch → LLM for economical classification):
2-7 word messages without clear markers
Compact prompt: ~150-200 tokens per 20-message batch
Safety default: KEEP on any parsing error
Real-time filtering (discord_bridge) uses conservative heuristics only:
- 1-char, pure reactions, single emoji, custom emoji-only
- 50+ single-word fillers
- Never deletes multi-word messages in real-time
- Philosophy: false negatives (junk stored) > false positives (data lost)
Consolidation gets the full hybrid pipeline with LLM for borderline
cases, achieving much better accuracy than the old 44% while keeping
token costs minimal (LLM only called during nightly consolidation,
not real-time chat).