Files
miku-discord/TESTING_V2.md

7.7 KiB

Testing Autonomous System V2

Quick Start Guide

Step 1: Enable V2 System (Optional - Test Mode)

The V2 system can run alongside V1 for comparison. To enable it:

Option A: Edit bot.py to start V2 on bot ready

Add this to the on_ready() function in bot/bot.py:

# After existing setup code, add:
from utils.autonomous_v2_integration import start_v2_system_for_all_servers

# Start V2 autonomous system
await start_v2_system_for_all_servers(client)

Option B: Manual API testing (no code changes needed)

Just use the API endpoints to check what V2 is thinking, without actually running it.

Step 2: Test the V2 Decision System

Check what V2 is "thinking" for a server:

# Get current social stats
curl http://localhost:3939/autonomous/v2/stats/<GUILD_ID>

# Example response:
{
  "status": "ok",
  "guild_id": 759889672804630530,
  "stats": {
    "loneliness": "0.42",
    "boredom": "0.65",
    "excitement": "0.15",
    "curiosity": "0.20",
    "chattiness": "0.70",
    "action_urgency": "0.48"
  }
}

Trigger a manual V2 analysis:

# See what V2 would decide right now
curl http://localhost:3939/autonomous/v2/check/<GUILD_ID>

# Example response:
{
  "status": "ok",
  "guild_id": 759889672804630530,
  "analysis": {
    "stats": { ... },
    "interest_score": "0.73",
    "triggers": [
      "KEYWORD_DETECTED (0.60): Interesting keywords: vocaloid, miku",
      "CONVERSATION_PEAK (0.60): Lots of people are chatting"
    ],
    "recent_messages": 15,
    "conversation_active": true,
    "would_call_llm": true
  }
}

Get overall V2 status:

# See V2 status for all servers
curl http://localhost:3939/autonomous/v2/status

# Example response:
{
  "status": "ok",
  "servers": {
    "759889672804630530": {
      "server_name": "Example Server",
      "loop_running": true,
      "action_urgency": "0.52",
      "loneliness": "0.30",
      "boredom": "0.45",
      "excitement": "0.20",
      "chattiness": "0.70"
    }
  }
}

Step 3: Monitor Behavior

Watch for V2 log messages:

docker compose logs -f bot | grep -E "🧠|🎯|🤔"

You'll see messages like:

🧠 Starting autonomous decision loop for server 759889672804630530
🎯 Interest score 0.73 - Consulting LLM for server 759889672804630530
🤔 LLM decision: YES, someone mentioned you (Interest: 0.73)

Compare V1 vs V2:

V1 logs:

💬 Miku said something general in #miku-chat

V2 logs:

🎯 Interest score 0.82 - Consulting LLM
🤔 LLM decision: YES
💬 Miku said something general in #miku-chat

Step 4: Tune the System

Edit bot/utils/autonomous_v2.py to adjust behavior:

# How sensitive is the decision system?
self.LLM_CALL_THRESHOLD = 0.6  # Lower = more responsive (more LLM calls)
self.ACTION_THRESHOLD = 0.5     # Lower = more chatty

# How fast do stats build?
LONELINESS_BUILD_RATE = 0.01    # Higher = gets lonely faster
BOREDOM_BUILD_RATE = 0.01       # Higher = gets bored faster

# Check intervals
MIN_SLEEP = 30    # Seconds between checks during active chat
MAX_SLEEP = 180   # Seconds between checks when quiet

Step 5: Understanding the Stats

Loneliness (0.0 - 1.0)

  • Increases: When not mentioned for >30 minutes
  • Decreases: When mentioned, engaged
  • Effect: At 0.7+, seeks attention

Boredom (0.0 - 1.0)

  • Increases: When quiet, hasn't spoken in >1 hour
  • Decreases: When shares content, conversation happens
  • Effect: At 0.7+, likely to share tweets/content

Excitement (0.0 - 1.0)

  • Increases: During active conversations
  • Decreases: Fades over time (decays fast)
  • Effect: Higher = more likely to jump into conversation

Curiosity (0.0 - 1.0)

  • Increases: Interesting keywords detected
  • Decreases: Fades over time
  • Effect: High curiosity = asks questions

Chattiness (0.0 - 1.0)

  • Set by mood:
    • excited/bubbly: 0.85-0.9
    • neutral: 0.5
    • shy/sleepy: 0.2-0.3
    • asleep: 0.0
  • Effect: Base multiplier for all interactions

Step 6: Trigger Examples

Test specific triggers by creating conditions:

Test MENTIONED trigger:

  1. Mention @Miku in the autonomous channel
  2. Check stats: curl http://localhost:3939/autonomous/v2/check/<GUILD_ID>
  3. Should show: "triggers": ["MENTIONED (0.90): Someone mentioned me!"]

Test KEYWORD trigger:

  1. Say "I love Vocaloid music" in channel
  2. Check stats
  3. Should show: "triggers": ["KEYWORD_DETECTED (0.60): Interesting keywords: vocaloid, music"]

Test CONVERSATION_PEAK:

  1. Have 3+ people chat within 5 minutes
  2. Check stats
  3. Should show: "triggers": ["CONVERSATION_PEAK (0.60): Lots of people are chatting"]

Test LONELINESS:

  1. Don't mention Miku for 30+ minutes
  2. Check stats: curl http://localhost:3939/autonomous/v2/stats/<GUILD_ID>
  3. Watch loneliness increase over time

Step 7: Debugging

V2 won't start?

# Check if import works
docker compose exec bot python -c "from utils.autonomous_v2 import autonomous_system_v2; print('OK')"

V2 never calls LLM?

# Check interest scores
curl http://localhost:3939/autonomous/v2/check/<GUILD_ID>

# If interest_score is always < 0.6:
# - Channel might be too quiet
# - Stats might not be building
# - Try mentioning Miku (instant 0.9 score)

V2 calls LLM too much?

# Increase threshold in autonomous_v2.py:
self.LLM_CALL_THRESHOLD = 0.7  # Was 0.6

Performance Monitoring

Expected LLM Call Frequency

Quiet server (few messages):

  • V1: ~10 random calls/day
  • V2: ~2-5 targeted calls/day
  • GPU usage: LOWER with V2

Active server (100+ messages/day):

  • V1: ~10 random calls/day (same)
  • V2: ~10-20 targeted calls/day (responsive to activity)
  • GPU usage: SLIGHTLY HIGHER, but much more relevant

Check GPU Usage

# Monitor GPU while bot is running
nvidia-smi -l 1

# V1: GPU spikes randomly every 15 minutes
# V2: GPU spikes only when something interesting happens

Monitor LLM Queue

If you notice lag:

  1. Check how many LLM calls are queued
  2. Increase LLM_CALL_THRESHOLD to reduce frequency
  3. Increase check intervals for quieter periods

Migration Path

Phase 1: Testing (Current)

  • V1 running (scheduled actions)
  • V2 running (parallel, logging decisions)
  • Compare behaviors
  • Tune V2 parameters

Phase 2: Gradual Replacement

# In server_manager.py, comment out V1 jobs:
# scheduler.add_job(
#     self._run_autonomous_for_server,
#     IntervalTrigger(minutes=15),
#     ...
# )

# Keep V2 running
autonomous_system_v2.start_loop_for_server(guild_id, client)

Phase 3: Full Migration

  • Disable all V1 autonomous jobs
  • Keep only V2 system
  • Keep manual triggers for testing

Troubleshooting

"Module not found: autonomous_v2"

# Restart the bot container
docker compose restart bot

"Stats always show 0.00"

  • V2 decision loop might not be running
  • Check: curl http://localhost:3939/autonomous/v2/status
  • Should show: "loop_running": true

"Interest score always low"

  • Channel might be genuinely quiet
  • Try creating activity: post messages, images, mention Miku
  • Loneliness/boredom build over time (30-60 min)

"LLM called too frequently"

  • Increase thresholds in autonomous_v2.py
  • Check which triggers are firing: use /autonomous/v2/check
  • Adjust trigger scores if needed

API Endpoints Reference

GET  /autonomous/v2/stats/{guild_id}  - Get social stats
GET  /autonomous/v2/check/{guild_id}  - Manual analysis (what would V2 do?)
GET  /autonomous/v2/status             - V2 status for all servers

Next Steps

  1. Run V2 for 24-48 hours
  2. Compare decision quality vs V1
  3. Tune thresholds based on server activity
  4. Gradually phase out V1 if V2 works well
  5. Add dashboard for real-time stats visualization