Implement Evil Miku mode with persistence, fix API event loop issues, and improve formatting

- Added Evil Miku mode with 4 evil moods (aggressive, cunning, sarcastic, evil_neutral) - Created evil mode content files (evil_miku_lore.txt, evil_miku_prompt.txt, evil_miku_lyrics.txt) - Implemented persistent evil mode state across restarts (saves to memory/evil_mode_state.json) - Fixed API endpoints to use client.loop.create_task() to prevent timeout errors - Added evil mode toggle in web UI with red theme styling - Modified mood rotation to handle evil mode - Configured DarkIdol uncensored model for evil mode text generation - Reduced system prompt redundancy by removing duplicate content - Added markdown escape for single asterisks (actions) while preserving bold formatting - Evil mode now persists username, pfp, and nicknames across restarts without re-applying changes
2026-01-02 17:11:58 +02:00
parent b38bdf2435
commit 6ec33bcecb
38 changed files with 5707 additions and 164 deletions
--- a/readmes/COGNEE_INTEGRATION_PLAN.md
+++ b/readmes/COGNEE_INTEGRATION_PLAN.md
@@ -0,0 +1,770 @@
+# Cognee Long-Term Memory Integration Plan
+
+## Executive Summary
+
+**Goal**: Add long-term memory capabilities to Miku using Cognee while keeping the existing fast, JSON-based short-term system.
+
+**Strategy**: Hybrid two-tier memory architecture
+- **Tier 1 (Hot)**: Current system - 8 messages in-memory, JSON configs (0-5ms latency)
+- **Tier 2 (Cold)**: Cognee - Long-term knowledge graph + vectors (50-200ms latency)
+
+**Result**: Best of both worlds - fast responses with deep memory when needed.
+
+---
+
+## Architecture Overview
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│                     Discord Event                            │
+│              (Message, Reaction, Presence)                   │
+└──────────────────────┬──────────────────────────────────────┘
+                       │
+                       ▼
+         ┌─────────────────────────────┐
+         │   Short-Term Memory (Fast)   │
+         │  - Last 8 messages          │
+         │  - Current mood             │
+         │  - Active context           │
+         │  Latency: ~2-5ms            │
+         └─────────────┬───────────────┘
+                       │
+                       ▼
+              ┌────────────────┐
+              │  LLM Response   │
+              └────────┬───────┘
+                       │
+         ┌─────────────┴─────────────┐
+         │                           │
+         ▼                           ▼
+┌────────────────┐         ┌─────────────────┐
+│ Send to Discord│         │  Background Job  │
+└────────────────┘         │  Async Ingestion │
+                           │  to Cognee       │
+                           │  Latency: N/A    │
+                           │  (non-blocking)  │
+                           └─────────┬────────┘
+                                     │
+                                     ▼
+                           ┌──────────────────────┐
+                           │  Long-Term Memory     │
+                           │  (Cognee)            │
+                           │  - Knowledge graph   │
+                           │  - User preferences  │
+                           │  - Entity relations  │
+                           │  - Historical facts  │
+                           │  Query: 50-200ms     │
+                           └──────────────────────┘
+```
+
+---
+
+## Performance Analysis
+
+### Current System Baseline
+```python
+# Short-term memory (in-memory)
+conversation_history.add_message(...)      # ~0.1ms
+messages = conversation_history.format()   # ~2ms
+JSON config read/write                      # ~1-3ms
+Total per response: ~5-10ms
+```
+
+### Cognee Overhead (Estimated)
+
+#### 1. **Write Operations (Background - Non-blocking)**
+```python
+# These run asynchronously AFTER Discord message is sent
+await cognee.add(message_text)        # 20-50ms
+await cognee.cognify()                # 100-500ms (graph processing)
+```
+**Impact on user**: ✅ NONE - Happens in background
+
+#### 2. **Read Operations (When querying long-term memory)**
+```python
+# Only triggered when deep memory is needed
+results = await cognee.search(query)  # 50-200ms
+```
+**Impact on user**: ⚠️ Adds 50-200ms to response time (only when used)
+
+### Mitigation Strategies
+
+#### Strategy 1: Intelligent Query Decision (Recommended)
+```python
+def should_query_long_term_memory(user_prompt: str, context: dict) -> bool:
+    """
+    Decide if we need deep memory BEFORE querying Cognee.
+    Fast heuristic checks (< 1ms).
+    """
+    # Triggers for long-term memory:
+    triggers = [
+        "remember when",
+        "you said",
+        "last week",
+        "last month",
+        "you told me",
+        "what did i say about",
+        "do you recall",
+        "preference",
+        "favorite",
+    ]
+    
+    prompt_lower = user_prompt.lower()
+    
+    # 1. Explicit memory queries
+    if any(trigger in prompt_lower for trigger in triggers):
+        return True
+    
+    # 2. Short-term context is insufficient
+    if context.get('messages_in_history', 0) < 3:
+        return False  # Not enough history to need deep search
+    
+    # 3. Question about user preferences
+    if '?' in user_prompt and any(word in prompt_lower for word in ['like', 'prefer', 'think']):
+        return True
+    
+    return False
+```
+
+#### Strategy 2: Parallel Processing
+```python
+async def query_with_hybrid_memory(prompt, user_id, guild_id):
+    """Query both memory tiers in parallel when needed."""
+    
+    # Always get short-term (fast)
+    short_term = conversation_history.format_for_llm(channel_id)
+    
+    # Decide if we need long-term
+    if should_query_long_term_memory(prompt, context):
+        # Query both in parallel
+        long_term_task = asyncio.create_task(cognee.search(prompt))
+        
+        # Don't wait - continue with short-term
+        # Only await long-term if it's ready quickly
+        try:
+            long_term = await asyncio.wait_for(long_term_task, timeout=0.15)  # 150ms max
+        except asyncio.TimeoutError:
+            long_term = None  # Fallback - proceed without deep memory
+    else:
+        long_term = None
+    
+    # Combine contexts
+    combined_context = merge_contexts(short_term, long_term)
+    
+    return await llm_query(combined_context)
+```
+
+#### Strategy 3: Caching Layer
+```python
+from functools import lru_cache
+from datetime import datetime, timedelta
+
+# Cache frequent queries for 5 minutes
+_cognee_cache = {}
+_cache_ttl = timedelta(minutes=5)
+
+async def cached_cognee_search(query: str):
+    """Cache Cognee results to avoid repeated queries."""
+    cache_key = query.lower().strip()
+    now = datetime.now()
+    
+    if cache_key in _cognee_cache:
+        result, timestamp = _cognee_cache[cache_key]
+        if now - timestamp < _cache_ttl:
+            print(f"🎯 Cache hit for: {query[:50]}...")
+            return result
+    
+    # Cache miss - query Cognee
+    result = await cognee.search(query)
+    _cognee_cache[cache_key] = (result, now)
+    
+    return result
+```
+
+#### Strategy 4: Tiered Response Times
+```python
+# Set different response strategies based on context
+RESPONSE_MODES = {
+    "instant": {
+        "use_long_term": False,
+        "max_latency": 100,  # ms
+        "contexts": ["reactions", "quick_replies"]
+    },
+    "normal": {
+        "use_long_term": "conditional",  # Only if triggers match
+        "max_latency": 300,  # ms
+        "contexts": ["server_messages", "dm_casual"]
+    },
+    "deep": {
+        "use_long_term": True,
+        "max_latency": 1000,  # ms
+        "contexts": ["dm_deep_conversation", "user_questions"]
+    }
+}
+```
+
+---
+
+## Integration Points
+
+### 1. Message Ingestion (Background - Non-blocking)
+
+**Location**: `bot/bot.py` - `on_message` event
+
+```python
+@globals.client.event
+async def on_message(message):
+    # ... existing message handling ...
+    
+    # After Miku responds, ingest to Cognee (non-blocking)
+    asyncio.create_task(ingest_to_cognee(
+        message=message,
+        response=miku_response,
+        guild_id=message.guild.id if message.guild else None
+    ))
+    
+    # Continue immediately - don't wait
+```
+
+**Implementation**: New file `bot/utils/cognee_integration.py`
+
+```python
+async def ingest_to_cognee(message, response, guild_id):
+    """
+    Background task to add conversation to long-term memory.
+    Non-blocking - runs after Discord message is sent.
+    """
+    try:
+        # Build rich context document
+        doc = {
+            "timestamp": datetime.now().isoformat(),
+            "user_id": str(message.author.id),
+            "user_name": message.author.display_name,
+            "guild_id": str(guild_id) if guild_id else None,
+            "message": message.content,
+            "miku_response": response,
+            "mood": get_current_mood(guild_id),
+        }
+        
+        # Add to Cognee (async)
+        await cognee.add([
+            f"User {doc['user_name']} said: {doc['message']}",
+            f"Miku responded: {doc['miku_response']}"
+        ])
+        
+        # Process into knowledge graph
+        await cognee.cognify()
+        
+        print(f"✅ Ingested to Cognee: {message.id}")
+        
+    except Exception as e:
+        print(f"⚠️ Cognee ingestion failed (non-critical): {e}")
+```
+
+### 2. Query Enhancement (Conditional)
+
+**Location**: `bot/utils/llm.py` - `query_llama` function
+
+```python
+async def query_llama(user_prompt, user_id, guild_id=None, ...):
+    # Get short-term context (always)
+    short_term = conversation_history.format_for_llm(channel_id, max_messages=8)
+    
+    # Check if we need long-term memory
+    long_term_context = None
+    if should_query_long_term_memory(user_prompt, {"guild_id": guild_id}):
+        try:
+            # Query Cognee with timeout
+            long_term_context = await asyncio.wait_for(
+                cognee_integration.search_long_term_memory(user_prompt, user_id, guild_id),
+                timeout=0.15  # 150ms max
+            )
+        except asyncio.TimeoutError:
+            print("⏱️ Long-term memory query timeout - proceeding without")
+        except Exception as e:
+            print(f"⚠️ Long-term memory error: {e}")
+    
+    # Build messages for LLM
+    messages = short_term  # Always use short-term
+    
+    # Inject long-term context if available
+    if long_term_context:
+        messages.insert(0, {
+            "role": "system",
+            "content": f"[Long-term memory context]: {long_term_context}"
+        })
+    
+    # ... rest of existing LLM query code ...
+```
+
+### 3. Autonomous Actions Integration
+
+**Location**: `bot/utils/autonomous.py`
+
+```python
+async def autonomous_tick_v2(guild_id: int):
+    """Enhanced with long-term memory awareness."""
+    
+    # Get decision from autonomous engine (existing fast logic)
+    action_type = autonomous_engine.should_take_action(guild_id)
+    
+    if action_type is None:
+        return
+    
+    # ENHANCEMENT: Check if action should use long-term context
+    context = {}
+    
+    if action_type in ["engage_user", "join_conversation"]:
+        # Get recent server activity from Cognee
+        try:
+            context["recent_topics"] = await asyncio.wait_for(
+                cognee_integration.get_recent_topics(guild_id, hours=24),
+                timeout=0.1  # 100ms max - this is background
+            )
+        except asyncio.TimeoutError:
+            pass  # Proceed without - autonomous actions are best-effort
+    
+    # Execute action with enhanced context
+    if action_type == "engage_user":
+        await miku_engage_random_user_for_server(guild_id, context=context)
+    
+    # ... rest of existing action execution ...
+```
+
+### 4. User Preference Tracking
+
+**New Feature**: Learn user preferences over time
+
+```python
+# bot/utils/cognee_integration.py
+
+async def extract_and_store_preferences(message, response):
+    """
+    Extract user preferences from conversations and store in Cognee.
+    Runs in background - doesn't block responses.
+    """
+    # Simple heuristic extraction (can be enhanced with LLM later)
+    preferences = extract_preferences_simple(message.content)
+    
+    if preferences:
+        for pref in preferences:
+            await cognee.add([{
+                "type": "user_preference",
+                "user_id": str(message.author.id),
+                "preference": pref["category"],
+                "value": pref["value"],
+                "context": message.content[:200],
+                "timestamp": datetime.now().isoformat()
+            }])
+
+def extract_preferences_simple(text: str) -> list:
+    """Fast pattern matching for common preferences."""
+    prefs = []
+    text_lower = text.lower()
+    
+    # Pattern: "I love/like/prefer X"
+    if "i love" in text_lower or "i like" in text_lower:
+        # Extract what they love/like
+        # ... simple parsing logic ...
+        pass
+    
+    # Pattern: "my favorite X is Y"
+    if "favorite" in text_lower:
+        # ... extraction logic ...
+        pass
+    
+    return prefs
+```
+
+---
+
+## Docker Compose Integration
+
+### Add Cognee Services
+
+```yaml
+# Add to docker-compose.yml
+
+  cognee-db:
+    image: postgres:15-alpine
+    container_name: cognee-db
+    environment:
+      - POSTGRES_USER=cognee
+      - POSTGRES_PASSWORD=cognee_pass
+      - POSTGRES_DB=cognee
+    volumes:
+      - cognee_postgres_data:/var/lib/postgresql/data
+    restart: unless-stopped
+    profiles:
+      - cognee  # Optional profile - enable with --profile cognee
+
+  cognee-neo4j:
+    image: neo4j:5-community
+    container_name: cognee-neo4j
+    environment:
+      - NEO4J_AUTH=neo4j/cognee_pass
+      - NEO4J_PLUGINS=["apoc"]
+    ports:
+      - "7474:7474"  # Neo4j Browser (optional)
+      - "7687:7687"  # Bolt protocol
+    volumes:
+      - cognee_neo4j_data:/data
+    restart: unless-stopped
+    profiles:
+      - cognee
+
+volumes:
+  cognee_postgres_data:
+  cognee_neo4j_data:
+```
+
+### Update Miku Bot Service
+
+```yaml
+  miku-bot:
+    # ... existing config ...
+    environment:
+      # ... existing env vars ...
+      - COGNEE_ENABLED=true
+      - COGNEE_DB_URL=postgresql://cognee:cognee_pass@cognee-db:5432/cognee
+      - COGNEE_NEO4J_URL=bolt://cognee-neo4j:7687
+      - COGNEE_NEO4J_USER=neo4j
+      - COGNEE_NEO4J_PASSWORD=cognee_pass
+    depends_on:
+      - llama-swap
+      - cognee-db
+      - cognee-neo4j
+```
+
+---
+
+## Performance Benchmarks (Estimated)
+
+### Without Cognee (Current)
+```
+User message → Discord event → Short-term lookup (5ms) → LLM query (2000ms) → Response
+Total: ~2005ms (LLM dominates)
+```
+
+### With Cognee (Instant Mode - No long-term query)
+```
+User message → Discord event → Short-term lookup (5ms) → LLM query (2000ms) → Response
+Background: Cognee ingestion (150ms) - non-blocking
+Total: ~2005ms (no change - ingestion is background)
+```
+
+### With Cognee (Deep Memory Mode - User asks about past)
+```
+User message → Discord event → Short-term (5ms) + Long-term query (150ms) → LLM query (2000ms) → Response
+Total: ~2155ms (+150ms overhead, but only when explicitly needed)
+```
+
+### Autonomous Actions (Background)
+```
+Autonomous tick → Decision (5ms) → Get topics from Cognee (100ms) → Generate message (2000ms) → Post
+Total: ~2105ms (+100ms, but autonomous actions are already async)
+```
+
+---
+
+## Feature Enhancements Enabled by Cognee
+
+### 1. User Memory
+```python
+# User asks: "What's my favorite anime?"
+# Cognee searches: All messages from user mentioning "favorite" + "anime"
+# Returns: "You mentioned loving Steins;Gate in a conversation 3 weeks ago"
+```
+
+### 2. Topic Trends
+```python
+# Autonomous action: Join conversation
+# Cognee query: "What topics have been trending in this server this week?"
+# Returns: ["gaming", "anime recommendations", "music production"]
+# Miku: "I've noticed you all have been talking about anime a lot lately! Any good recommendations?"
+```
+
+### 3. Relationship Tracking
+```python
+# Knowledge graph tracks:
+# User A → likes → "cats"
+# User B → dislikes → "cats"
+# User A → friends_with → User B
+
+# When Miku talks to both: Avoids cat topics to prevent friction
+```
+
+### 4. Event Recall
+```python
+# User: "Remember when we talked about that concert?"
+# Cognee searches: Conversations with this user + keyword "concert"
+# Returns: "Yes! You were excited about the Miku Expo in Los Angeles in July!"
+```
+
+### 5. Mood Pattern Analysis
+```python
+# Query Cognee: "When does this server get most active?"
+# Returns: "Evenings between 7-10 PM, discussions about gaming"
+# Autonomous engine: Schedule more engagement during peak times
+```
+
+---
+
+## Implementation Phases
+
+### Phase 1: Foundation (Week 1)
+- [ ] Add Cognee to `requirements.txt`
+- [ ] Create `bot/utils/cognee_integration.py`
+- [ ] Set up Docker services (PostgreSQL, Neo4j)
+- [ ] Basic initialization and health checks
+- [ ] Test ingestion in background (non-blocking)
+
+### Phase 2: Basic Integration (Week 2)
+- [ ] Add background ingestion to `on_message`
+- [ ] Implement `should_query_long_term_memory()` heuristics
+- [ ] Add conditional long-term queries to `query_llama()`
+- [ ] Add caching layer
+- [ ] Monitor latency impact
+
+### Phase 3: Advanced Features (Week 3)
+- [ ] User preference extraction
+- [ ] Topic trend analysis for autonomous actions
+- [ ] Relationship tracking between users
+- [ ] Event recall capabilities
+
+### Phase 4: Optimization (Week 4)
+- [ ] Fine-tune timeout thresholds
+- [ ] Implement smart caching strategies
+- [ ] Add Cognee query statistics to dashboard
+- [ ] Performance benchmarking and tuning
+
+---
+
+## Configuration Management
+
+### Keep JSON Files (Hot Config)
+```python
+# These remain JSON for instant access:
+- servers_config.json       # Current mood, sleep state, settings
+- autonomous_context.json   # Real-time autonomous state
+- blocked_users.json        # Security/moderation
+- figurine_subscribers.json # Active subscriptions
+
+# Reason: Need instant read/write, changed frequently
+```
+
+### Migrate to Cognee (Historical Data)
+```python
+# These can move to Cognee over time:
+- Full DM history (dms/*.json) → Cognee knowledge graph
+- Profile picture metadata → Cognee (searchable by mood)
+- Reaction logs → Cognee (analyze patterns)
+
+# Reason: Historical, queried infrequently, benefit from graph relationships
+```
+
+### Hybrid Approach
+```json
+// servers_config.json - Keep recent data
+{
+  "guild_id": 123,
+  "current_mood": "bubbly",
+  "is_sleeping": false,
+  "recent_topics": ["cached", "from", "cognee"]  // Cache Cognee query results
+}
+```
+
+---
+
+## Monitoring & Observability
+
+### Add Performance Tracking
+
+```python
+# bot/utils/cognee_integration.py
+
+import time
+from dataclasses import dataclass
+from typing import Optional
+
+@dataclass
+class CogneeMetrics:
+    """Track Cognee performance."""
+    total_queries: int = 0
+    cache_hits: int = 0
+    cache_misses: int = 0
+    avg_query_time: float = 0.0
+    timeouts: int = 0
+    errors: int = 0
+    background_ingestions: int = 0
+
+cognee_metrics = CogneeMetrics()
+
+async def search_long_term_memory(query: str, user_id: str, guild_id: Optional[int]) -> str:
+    """Search with metrics tracking."""
+    start = time.time()
+    cognee_metrics.total_queries += 1
+    
+    try:
+        result = await cached_cognee_search(query)
+        
+        elapsed = time.time() - start
+        cognee_metrics.avg_query_time = (
+            (cognee_metrics.avg_query_time * (cognee_metrics.total_queries - 1) + elapsed) 
+            / cognee_metrics.total_queries
+        )
+        
+        return result
+        
+    except asyncio.TimeoutError:
+        cognee_metrics.timeouts += 1
+        raise
+    except Exception as e:
+        cognee_metrics.errors += 1
+        raise
+```
+
+### Dashboard Integration
+
+Add to `bot/api.py`:
+
+```python
+@app.get("/cognee/metrics")
+def get_cognee_metrics():
+    """Get Cognee performance metrics."""
+    from utils.cognee_integration import cognee_metrics
+    
+    return {
+        "enabled": globals.COGNEE_ENABLED,
+        "total_queries": cognee_metrics.total_queries,
+        "cache_hit_rate": (
+            cognee_metrics.cache_hits / cognee_metrics.total_queries 
+            if cognee_metrics.total_queries > 0 else 0
+        ),
+        "avg_query_time_ms": cognee_metrics.avg_query_time * 1000,
+        "timeouts": cognee_metrics.timeouts,
+        "errors": cognee_metrics.errors,
+        "background_ingestions": cognee_metrics.background_ingestions
+    }
+```
+
+---
+
+## Risk Mitigation
+
+### Risk 1: Cognee Service Failure
+**Mitigation**: Graceful degradation
+```python
+if not cognee_available():
+    # Fall back to short-term memory only
+    # Bot continues functioning normally
+    return short_term_context_only
+```
+
+### Risk 2: Increased Latency
+**Mitigation**: Aggressive timeouts + caching
+```python
+MAX_COGNEE_QUERY_TIME = 150  # ms
+# If timeout, proceed without long-term context
+```
+
+### Risk 3: Storage Growth
+**Mitigation**: Data retention policies
+```python
+# Auto-cleanup old data from Cognee
+# Keep: Last 90 days of conversations
+# Archive: Older data to cold storage
+```
+
+### Risk 4: Context Pollution
+**Mitigation**: Relevance scoring
+```python
+# Only inject Cognee results if confidence > 0.7
+if cognee_result.score < 0.7:
+    # Too irrelevant - don't add to context
+    pass
+```
+
+---
+
+## Cost-Benefit Analysis
+
+### Benefits
+✅ **Deep Memory**: Recall conversations from weeks/months ago
+✅ **User Preferences**: Remember what users like/dislike
+✅ **Smarter Autonomous**: Context-aware engagement
+✅ **Relationship Graph**: Understand user dynamics
+✅ **No User Impact**: Background ingestion, conditional queries
+✅ **Scalable**: Handles unlimited conversation history
+
+### Costs
+⚠️ **Complexity**: +2 services (PostgreSQL, Neo4j)
+⚠️ **Storage**: ~100MB-1GB per month (depending on activity)
+⚠️ **Latency**: +50-150ms when querying (conditional)
+⚠️ **Memory**: +500MB RAM for Neo4j, +200MB for PostgreSQL
+⚠️ **Maintenance**: Additional service to monitor
+
+### Verdict
+✅ **Worth it if**:
+- Your servers have active, long-running conversations
+- Users want Miku to remember personal details
+- You want smarter autonomous behavior based on trends
+
+❌ **Skip it if**:
+- Conversations are mostly one-off interactions
+- Current 8-message context is sufficient
+- Hardware resources are limited
+
+---
+
+## Quick Start Commands
+
+### 1. Enable Cognee
+```bash
+# Start with Cognee services
+docker-compose --profile cognee up -d
+
+# Check Cognee health
+docker-compose logs cognee-neo4j
+docker-compose logs cognee-db
+```
+
+### 2. Test Integration
+```python
+# In Discord, test long-term memory:
+User: "Remember that I love cats"
+Miku: "Got it! I'll remember that you love cats! 🐱"
+
+# Later...
+User: "What do I love?"
+Miku: "You told me you love cats! 🐱"
+```
+
+### 3. Monitor Performance
+```bash
+# Check metrics via API
+curl http://localhost:3939/cognee/metrics
+
+# View Cognee dashboard (optional)
+# Open browser: http://localhost:7474 (Neo4j Browser)
+```
+
+---
+
+## Conclusion
+
+**Recommended Approach**: Implement Phase 1-2 first, then evaluate based on real usage patterns.
+
+**Expected Latency Impact**: 
+- 95% of messages: **0ms** (background ingestion only)
+- 5% of messages: **+50-150ms** (when long-term memory explicitly needed)
+
+**Key Success Factors**:
+1. ✅ Keep JSON configs for hot data
+2. ✅ Background ingestion (non-blocking)
+3. ✅ Conditional long-term queries only
+4. ✅ Aggressive timeouts (150ms max)
+5. ✅ Caching layer for repeated queries
+6. ✅ Graceful degradation on failure
+
+This hybrid approach gives you deep memory capabilities without sacrificing the snappy response times users expect from Discord bots.