koko210Serve 5a740c9334 feat(memory): hybrid trivial-message classifier (heuristics + LLM batch)
Step 3 of memory system overhaul: smart junk detection.

Replaces the old 37-pattern frozenset (44% accuracy) with a 3-tier hybrid:

TIER 1 - DEFINITELY_TRIVIAL (instant delete, no LLM):
  50+ exact-match patterns, pure emoji, single char, punctuation-only

TIER 2 - DEFINITELY_IMPORTANT (instant keep, no LLM):
  8+ words, question with substance, first-person statements,
  numbers/dates, links, mentions

TIER 3 - BORDERLINE (batch → LLM for economical classification):
  2-7 word messages without clear markers
  Compact prompt: ~150-200 tokens per 20-message batch
  Safety default: KEEP on any parsing error

Real-time filtering (discord_bridge) uses conservative heuristics only:
  - 1-char, pure reactions, single emoji, custom emoji-only
  - 50+ single-word fillers
  - Never deletes multi-word messages in real-time
  - Philosophy: false negatives (junk stored) > false positives (data lost)

Consolidation gets the full hybrid pipeline with LLM for borderline
cases, achieving much better accuracy than the old 44% while keeping
token costs minimal (LLM only called during nightly consolidation,
not real-time chat).
2026-05-15 14:07:35 +03:00
2025-12-07 17:15:09 +02:00
Description
Llama.cpp-powered Hatsune Miku Discord bot with autonomous features, chat, and image generation
459 MiB
Languages
Python 81.9%
JavaScript 10.3%
HTML 4.1%
Jupyter Notebook 1.2%
Shell 1%
Other 1.5%